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This  final  progress  report  summaries  the  main  recent  results;  full  reports  of  the 
results  are  contained  in  the  papers  appended  herewith.  The  summary  also 
reviews  some  results  from  previous  AFOSR  grants  where  these  are  necessary 
to  provide  the  background  for  the  current  research.  Four  areas  are 
summarized: 

1.  Basic  Mechanisms  of  Visual  Motion  and  Texture  Perception 

2.  Lateral  Interations  in  Texture  Stimuli 

3.  Information  Processing 

4.  Visual  Attention  and  Short-Term  Memory. 


1.  Basic  Mechanisms  of  Visual  Motion  and  Texture  Perception 

This  project  concerned  the  discovery  and  description  of  basic  mechanisms  of  human  visual 
motion  and  texture  perception.  Motion  and  texture  are  critical  i  puts  to  visual  perception.  Basic 
mechanisms  of  motion  are  of  particular  interest  because  they  are  perhaps  the  primary  substrate  for 
perceptual  recovery  of  3D  depth  structures  and  orientation  in  space,  they  are  critical  for  detecting 
new  objects  and  events  in  the  environment,  as  well  as  playing  an  important  role  in  2D  perception. 

Motion  and  texture  are  considered  together  here  because  the  problem  of  discriminating 
velocity  in  a  one-dimensional  motion  stimulus  is  formally  equivalent  to  the  problem  of 
discriminating  orientation  in  a  texture  stimulus:  the  t  dimension  of  the  motion  stimulus  becomes 
the  y  dimension  of  the  texture  stimulus. 


First-Order  Motion  Perception 


First-order  motion  perception.  The  initial  studies,  carried  out  at  the  inception  of  AFOSR 
support,  succeeded  in  describing  the  basic  mechanism  of  human  Fourier  motion  perception  in  full 
mathematical  detail.  Several  critical  insights  made  this  possible.  The  most  important  was 
recognizing  that  the  failure  of  previous  theoretical  attempts  to  apply  Reichardt  (1957)  and  similar 
systems  models  to  human  vision  (e.g.  Foster,  1971)  was  due  in  large  measure  to  the  fact  that  they 
had  dealt  with  data  obtained  with  high-contrast  visual  stimuli.  The  human  motion-processing 
system  behaves  in  a  simple  way  for  stimuli  whose  contrast  is  less  than  about  0.04  to  0.05  (e.g. 
Nakayama  &  Silverman,  1985,  others).  For  higher  contrasts,  early  nonlinearities  in  the  visual 
system  make  the  analysis  the  motion  processing  enormously  more  complex.  Additionally,  because 
hundreds  of  thousands  of  detectors  may  contribute  to  human  psychophysical  responses,  formal 
models  need  to  explicitly  model  decision  processes.  Finally,  stimuli  needed  to  be  developed  that 
permitted  conclusions  about  basic  motion  computations  independent  of  the  voting/decision  rules 
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imposed  by  higher-order  processes. 

van  Santen  &  Sperling  (1984)  was  perhaps  the  first  successful  application  of  these  basic 
principles  first-order  motion  perception  to  humans,  principles  that  are  now  quite  widely  accepted, 
van  Santen  &  Sperling  (1985)  showed  the  equivalence  of  two  subsequent  models  (Adelson  & 
Bergen,  Watson  &  Ahumada)  to  the  van  Santen-Sperling  version  of  the  Reichardt  model  and  it 
developed  new  results. 

van  Santen,  J.  P.  H.  and  Sperling,  G.  (1984),  Temporal  covariance  model  of  human  motion 
perception.  Journal  of  the  Optical  Society  of  America  -  A,  1,  451-473. 

The  first  of  two  papers  by  van  Santen  and  Sperling  reports  that,  by  elaborating  a  Reichardt 
model  that  had  previously  been  proposed  for  insect  vision,  the  model  gives  an  excellent  account 
of  human  psychophysical  data  for  low-contrast  stimuli.  To  apply  a  Reichardt  detector  to  human 
vision  requires  in  considering  voting  rules  (e.g.,  absolute  maximum  or  total  power)  for  detectors 
because  many  detectors  present  possibly  conflicting  information  to  the  decision  stage.  There  is  a 
full  mathematical  development  of  the  elaborated  theory.  Many  counter-intuitive  predictions  were 
generated  by  the  theory,  and  three  were  experimentally  tested.  (1)  A  superimposed  stationary 
grating,  even  of  a  grating  the  same  spatial  frequency  as  a  moving  grating,  should  not  adversely 
affect  motion-direction  discrimination.  (2)  Similarly,  a  stationary  flickering  grid  should  have  not 
affect  motion  discrimination  of  a  moving  stimuli  with  different  temporal  frequency.  When 
temporal  frequencies  of  the  moving  and  masking  stimuli  are  the  same,  then  anything  may  happen, 
even  an  illusion  of  motion  in  the  opposite  direction.  This  apparent  reversal  of  direction  of  the 
moving  grating  for  certain  predicatable  phase  relations  of  the  masking  stimulus  was  demonstrated 
experimentally.  (3)  For  certain  spatially-sampled  displays,  the  strength  of  a  motion  percept  is 
directly  proportional  to  the  product  of  the  contrast  in  adjacent  regions.  All  three  predictions  were 
verified.  These  data  show  that,  contrary  to  "logical  intuition,"  human  motion  detection  does  not 
rely  on  matching  spatial  features  in  successive  frames,  but  rather  on  matching  of  temporal 
sequences  in  adjacent  locations. 

van  Santen,  J.  P.  H.  and  Sperling,  G.  (1985)  Elaborated  Reichardt  detectors.  Journal  of  the 
Optical  Society  of  America  -  A,  2,  300-321. 

This  paper  extends  the  predictive  power  of  the  elaborated  Reichardt  model  from  continuous 
to  two-flash  stimuli,  and  to  other  displays,  such  as  random  dot  displays,  that  had  previously  been 
thought  to  require  "feature"  models.  It  points  out  that  the  Reichardt  model  is  consistent  with  a  3D 
spatiotemporal  Fourier  analysis  of  visual  displays.  However,  when  complex  displays  contain 
several  Fourier  components  of  approximately  equal  perceptual  strength,  a  more  complex  analysis 
such  as  that  of  the  elaborated  Reichardt  model,  is  needed  to  generate  predictions.  For  example, 
displays  in  which  component  Fourier  components  move  in  the  same  direction  and  at  the  same 
temporal  frequency  exhibit  as  more  convincing  movement  than  displays  in  which  the  components 
move  at  the  same  velocity  so  to  preserve  2D  rigidity.  It  was  proved  that,  for  elaborated  Reichardt 
detectors,  the  strength  of  motion  in  two  flash  displays  is  predicted  by  separable  temporal  and 
spatial  components,  so  that  these  displays  are  ideal  for  studying  the  pure  spatial  properties  of 
motion  detectors.  Finally,  it  was  proved  that  two  alternative  computational  theories  (Adelson  & 
Bergen,  1985  and  Watson  &  Ahumada,  1985)  for  which  no  experimental  data  had  yet  been 
generated,  were  computationally  equivalent  to  the  elaborated  Reichardt  model. 
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Investigations  of  Second-Order  Motion  and  Texture 

The  theoretical  analysis  and  experimental  evidence  described  above  establishes  an  elaborated 
Reichardt  (or  equivalent  kind  of  motion  computation)  as  the  basic  mechanism  of  motion 
perception.  The  work  of  the  current  granting  period  dealt  with  a  newly  discovered  second 
mechanism  of  motion  perception,  which  was  called  "Second-order”  or  "Non-Fourier"  motion 
processing  to  distinguish  it  from  the  previously  described  "First-order"  or  "Fourier"  motion 
perception.  The  computational  principles  that  applied  to  second-order  motion  perception  were 
found  also  to  apply  to  the  perception  of  two-dimensional  textures. 

Chubb,  Charles,  and  George  Sperling.  (1988).  Drift-balanced  random  stimuli:  A  general  basis  for 
studying  non-Fourier  motion  perception.  Journal  of  the  Optical  Society  of  America  A:  Optics  and 
Image  Science,  5,  1986-2006. 

This  paper  sets  forth  the  general  principles.  It  shows  how  to  construct  counterexamples  to 
first-order  motion  computations:  visual  stimuli  which  (i)  are  consistently  perceived  as  obviously 
moving  in  a  fixed  direction,  yet  for  which  (ii)  Fourier  domain  energy  analysis  yields  no  systematic 
motion  components  in  any  given  direction.  A  general  theoretical  framework  for  investigating 
nonFourier  (second-order)  motion-perception  mechanisms;  two  central  concepts  are  drift  balanced 
and  microbalanced  random  stimuli.  A  random  stimulus  S  is  drif  t balanced  if  its  expected  power 
in  the  frequency  domain  is  symmetric  with  respect  to  temporal  frequency:  that  is,  if  the  expected 
power  in  S  of  every  drifting  sinusoidal  component  is  equal  to  the  expected  power  of  the  sinusoid 
of  the  same  spatial  frequency,  drifting  at  the  same  rate  in  the  opposite  direction.  Additionally,  S 
is  microbalanced  if  the  result  WS  of  windowing  S  by  any  space-time  separable  function  W  is 
driftbalanced.  It  is  proved  that  (i)  any  space/time  separable  random  (or  nonrandom)  stimulus  is 
microbalanced;  (iia)  any  linear  combination  of  a  pairwise  independent  microbalanced  random 
stimuli  is  microbalanced,  and  any  linear  combination  of  a  pairwise  independent  driftbalanced 
random  stimuli  is  driftbalanced  if  the  expectation  of  each  component  is  zero  (a  uniform  field);  (iii) 
the  convolution  of  independent  micro/driftbalanced  random  stimuli  is  micro/driftbalanced;  (iv)  the 
product  of  independent  microbalanced  random  stimuli  is  microbalanced.  Examples  are  provided 
of  classes  of  driftbalanced  random  stimuli  which  display  consistent  and  compelling  motion  in  one 
direction  although  they  would  be  completely  ambiguous  to  any  first-order  motion  mechanism. 
The  perception  of  nonFourier  motion  stimuli  is  explained  by  postulating  a  linear  space-invariant 
filter  followed  by  a  rectifying  mechanism  that  computes  (any  increasing  function  of)  the  absolute 
value  of  stimulus  contrast  followed  by  Fourier-energy  (e.g.,  Reichardt)  motion  analysis.  All  the 
results  and  examples  from  the  domain  of  motion  perception  are  transposable  to  and  illustrated  in 
the  space-domain  problem  of  detecting  orientation  in  texture  patterns. 

Chubb,  Charles,  and  George  Sperling.  (1989).  Second-order  motion  perception:  Space-time 
separable  mechanisms.  Proceedings:  Workshop  on  Visual  Motion.  (March  20-22,  1989,  Irvine, 
California.)  Washington,  D.C:  IEEE  Computer  Society  Press.  Pp.  126-138. 

This  paper  shows  how  various  classes  of  microbalanced  displays  can  be  used  to  derive 
properties  of  second-order  motion  systems.  Microbalanced  stimuli  are  dynamic  displays  which  do 
not  stimulate  mechanisms  that  apply  standard  motion  analysis  directly  to  luminance  (e.g., 
Adelson-Bergen  motion-energy  analyzers,  Watson-Ahumada  motion  sensors,  or  elaborated 
Reichardt  detectors.)  Because  they  bypass  first-order  mechanisms,  microbalanced  stimuli  are 
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uniquely  useful  for  studying  second- order  motion  perception  (motion  perception  served  by 
mechanisms  that  require  a  grossly  nonlinear  stimulus  transformation  prior  to  standard  analysis). 
The  paper  demonstrates  stimuli  that  are  microbalanced  under  all  pointwise  stimulus 
transformations  and  therefore  immune  to  early  visual  nonlinearities.  Such  stimuli  are  used  to 
disable  motion  information  derived  from  spatial  filtering  in  order  to  isolate  the  temporal  properties 
of  space/time  separable  second-order  motion  mechanisms.  They  are  equally  useful  to  disable  the 
motion  information  derived  from  temporal  filtering  to  isolate  the  spatial  properties. 

The  paper  proposes  that  second-order  motion  of  all  of  the  classes  of  microbalanced  stimuli 
under  consideration  can  be  extracted  by  a  mechanism  consisting  of  the  following  stages;  (la) 
band-selective  spatial  filtering  and  (lb)  biphasic  temporal  filtering,  nonzero  in  dc,  followed  by  (2) 
a  rectifying  nonlinearity  and  (3)  standard  motion  analysis. 

Chubb,  Charles,  and  George  Sperling.  (1989).  Two  motion  perception  mechanisms  revealed  by 
distance  driven  reversal  of  apparent  motion.  Proceedings  of  the  National  Academy  of  Sciences, 
USA,  86,  2985-2989. 

It  is  reasonable  to  ask  whether  there  really  are  two  mechanisms  of  motion  perception  or 
whether  one  theory  can  encompasses  both.  One  way  to  demonstrate  the  existence  of  two 
mechanisms  is  to  stimulate  them  to  simultaneously  give  opposite  outputs  in  response  to  the  same 
stimulus.  This  paper  demonstrates  two  kinds  of  visual  stimuli  that  exhibit  motion  in  one  direction 
when  viewed  from  near  and  in  the  opposite  direction  from  afar.  These  striking  reversals  occur 
because  each  kind  of  stimulus  is  constructed  to  simultaneously  activate  two  different  mechanisms: 
a  short-range  mechanism  that  computes  motion  from  space-time  correspondences  in  stimulus 
luminance  and  a  long-range  mechanism  whose  motion  computations  are  performed,  instead,  on 
stimulus  contrast  that  has  been  full-wave  rectified  (e.g.,  the  absolute  value  of  contrast).  The 
stimuli  were  constructed  so  that  half-wave  rectification  could  be  excluded.  It  is  concluded  that 
both  a  Fourier  and  a  nonFourier  computation  occur.  In  this  and  all  previously  studied  cases  of 
2nd  order  motion  perception,  full  wave  rectification  has  been  shown  to  be  a  sufficient  mechanism; 
for  these  stimuli,  full  wave  rectification  (versus  half-wave  rectification)  is  shown  to  be  necessary. 

An  analogous  phenomenon,  distance-driven  reversal  of  apparent  slant,  occurs  with  texture 
stimuli.  Apparently,  in  both  motion  and  texture  extraction  from  visual  scenes,  there  are  two 
parallel  mechanisms,  operating  simultaneously,  a  first-order  mechanism  that  operates  directly  on 
the  Fourier  components  of  the  stimulus,  and  a  second-order  mechanism  that  operates  on  a 
spatiotemporally  filtered,  full-wave  rectified  transformation  of  the  stimulus. 

Chubb,  Charles,  and  George  Sperling.  (1991).  Texture  quilts:  Basic  tools  for  studying  motion- 
from-texture.  Journal  of  Mathematical  Psychology,  35,  411-442. 

This  paper  continues  the  investigation  of  motion-from-spatial-texture  in  stimuli  that  are  free 
from  contamination  by  motion  mechanisms  sensitive  to  anything  except  texture.  It  offers  a  formal 
foundation  for  some  of  the  results  outlined  in  Chubb  &  Sperling’s  (1989)  IEEE  paper,  and  reports 
the  results  of  three  demonstration  experiments  that  establish  empirical  properties  of  human 
second-order  motion  perception.  Additionally,  some  concrete  stimulus-construction  methods  are 
provided  for  a  special  class  of  random  stimuli  called  texture  quilts.  Although,  as  is  demonstrated 
experimentally,  certain  texture  quilts  display  consistent  apparent  motion,  it  is  proven  that  their 
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motion  content  (a)  is  unavailable  to  standard  motion  analysis  (such  as  might  be  accomplished  by 
an  Adelson/Bergen  motion-energy  analyzer,  a  Watson/Ahumada  motion  sensor,  or  by  any 
elaborated  Reichardt  detector),  and  (b)  cannot  be  exposed  to  standard  motion  analysis  by  any 
purely  temporal  signal  transformation  no  matter  how  nonlinear  (e.g.,  temporal  differentiation 
followed  by  rectification).  Applying  such  a  purely  temporal  transformation  to  any  texture  quilt 
produces  a  spatiotemporal  function  P  whose  motion  is  unavailable  to  standard  motion  analysis: 
The  expected  response  of  every  Reichardt  detector  to  P  is  0  at  every  instant  in  time. 

Three  quilts  were  studied  experimentally:  a  quilt  that  relies  on  differences  in  spatial 
frequency  to  generate  perception  of  motion,  a  quilt  that  relies  on  sensitivity  to  differences  in 
orientation,  and  quilt  that  relies  on  the  difference  between  an  even  texture  and  a  jointly- 
independent  random  texture.  The  simplest  mechanism  sufficient  to  sense  the  motion  exhibited  by 
texture  quilts  consists  of  three  successive  stages:  (i)  a  purely  spatial  linear  filter  (ii)  a  rectifier  to 
transform  regions  of  large  negative  or  positive  responses  into  regions  of  high  positive  values,  and 
(iii)  standard  motion  analysis.  The  first  quilt  demonstrates  that  the  spatial  filter  is  frequency 
selective.  The  second  quilt  demonstrates  that  there  exist  orientation  selective  filters.  The  third 
quilt  demonstrates  that  the  rectifier  cannot  embody  a  perfect  squaring  (power)  function. 

Werkhoven,  Peter,  George  Sperling,  and  Chubb,  Charles.  (1993).  The  dimensionality  of  texture- 
defined  motion:  A  single  channel  theory.  Vision  Research,  33,  463-485. 

This  paper  explores  texture-defined  motion  between  similarly  oriented  sinusoidal  patches.  It 
exploits  two  ambiguous  motion  displays  (types  I  and  II)  in  each  of  which  apparent  motion  can  be 
perceived  in  either  of  two  directions.  One  of  these  directions  is  along  a  homogeneous  space-time 
path  in  which  all  successive  sinusoidal  patches  are  identical  in  spatial  frequency  and  contrast. 
Along  the  other,  oppositely  directed,  path  is  composed  of  heterogeneous  patches  that  vary  in 
spatial  frequency  and  contrast.  The  striking  and  counterintuitive  result  is  that  for  a  wide  variety  of 
display  conditions,  perceived  motion  along  the  heterogeneous  path  dominates  the  homogeneous 
path.  Obviously,  when  perceived  motion  along  a  path  composed  of  alternating  high-  and  low- 
frequency  patches  dominates  perceived  motion  along  a  pure  high-frequency  path,  the  strength  of 
texture-defined  motion  is  not  governed  by  a  similarity  metric. 

All  the  results  are  explained  in  terms  of  an  activity  transformation.  Each  patch  is  assumed 
to  cause  a  perceptual  response  (activity).  Strength  of  perceived  motion  along  a  path  is  determined 
by  the  product  of  the  activities  of  adjacent  patches  along  the  path.  The  path  with  the  greatest 
product  dominates. 

Whenever  a  particular  combination  of  patch  contrasts  and  spatial  frequencies  caused  the  two 
motion  paths  to  be  balanced  in  displays  of  type  I,  then  they  were  found  to  be  also  balanced  in 
type  II  displays,  a  condition  referred  to  as  transition  invariance.  Under  quite  reasonable 
assumptions  about  the  motion  mechanism,  it  was  shown  that  transition  invariance  implies  that 
activity  must  be  a  one-dimensional  quantity.  Indeed,  activity  is  well-described  as  the  rectified 
output  of  a  spatial  low-pass  filter. 

Werkhoven,  Peter,  Charles  Chubb,  and  George  Sperling.  (1994)  Perception  of  Apparent  Motion 
between  Dissimilar  Gratings:  Spatiotemporal  Properties.  Vision  Research.  (Accepted  for 
publication  pending  revisions.) 
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This  paper  continues  the  search  for  the  determinants  of  the  perceptual  strength  of  texture- 
defined  motion  (i.e.,  motion  strength  of  stimuli  that  have  no  net  directional  energy  in  the  Fourier 
domain).  Werkhoven,  Sperling,  &  Chubb  (1993)  demonstrated  that  correspondence  in  spatial 
frequency  and  contrast  between  neighboring  patches  of  texture  in  a  spatiotemporal  motion  path  is 
irrelevant  to  motion  strength,  only  activity — the  rectified  output  of  a  spatial  lowpass 
filter — mattered.  As  in  Werkhoven  et  al  (1993),  the  motion  stimuli  are  ambiguous  motion 
displays  in  which  one  motion  path,  consisting  of  patches  of  nonsimilar  texture,  competes  with 
another  motion  path,  having  patches  only  of  similar  texture.  The  textural  parameters  of  spatial 
frequency,  contrast,  texture  orientation  (slant),  and  temporal  frequency  are  systematically  explored. 

The  data  show  that  motion  between  dissimilar  patches  of  texture  (which  are  orthogonally 
oriented,  have  a  two  octave  difference  in  spatial  frequency  and  differ  S0%  in  contrast)  can  easily 
dominate  motion  between  similar  patches  of  texture.  The  relative  motion  strengths  of  two  paths  is 
invariant  with  temporal  frequency  from  1  to  4  Hz.  Analysis  of  the  data  shows  that  the  motion 
computation  is  largely  but  not  entirely  one-dimensional:  Extreme  orientation  differences  and  very 
large  spatial  frequency  differences  bring  into  play  small  but  significant  contributions  of  a  second 
dimension  (or  dimensions). 


2.  Lateral  Interactions  in  Texture  Stimuli:  Contrast-Contrast 

Chubb,  Charles,  George  Sperling,  and  Joshua  A.  Solomon.  (1989).  Texture  interactions  determine 
perceived  contrast.  Proceedings  of  the  National  Academy  of  Sciences,  USA,  86,  9631-9635. 

Various  visual  illusions  that  have  been  demonstrated  for  first-order  stimuli,  may  be  expected 
to  have  corresponding  second-order  illusions.  When  the  illusions  are  the  result  of  important 
properties  of  signal  processing,  such  as  boundary  enhancement  and  gain  control,  the  corresponding 
second-order  illusions  should  be  quite  informative  about  the  corresponding  second-order  process.. 
This  paper  considers  the  second-order  analog  to  perhaps  the  most  famous  first-order  lightness 
illusion,  namely  that  the  apparent  lightness  of  .  a  uniformly  illuminated  patch  depends  on  the 
luminance  of  its  surround.  Here  it  is  reported  that  the  perceived  contrast  of  a  test  patch  P  of 
binary  visual  noise  embedded  in  a  surrounding  noise  field  S  depends  substantially  on  the  contrast 
of  S.  When  P  is  swrounded  by  high-contrast  noise,  its  bright  points  appear  dimmer,  and 
simultaneously,  its  dark  points  appear  less  dark  than  when  P  is  surrounded  by  a  uniform  field, 
even  though  local  mean  luminance  is  kept  constant  across  all  displays.  Sinusoidally  modulating 
the  contrast  of  the  noise  surround  5  causes  the  apparent  contrast  of  P  to  modulate  in  antiphase 
to  C5.  For  P  of  contrast  Cp,  nulling  procedures  show  that  the  induced  induced  contrast 
modulation  of  P  reaches  0.45  Cp.  This  very  large,  heretofore  unnoticed,  spatial  interaction  is 
unanticipated  by  all  current  theories  of  lightness  perception.  It  suggests  a  very  general  principle 
of  perceptual  computation:  gain  control.  Gain  control  may  be  be  a  nearly  universal  process 
whereby  the  response  of  all  a  detector  is  normalized  relative  to  the  responses  of  their  neighbors  in 
the  same  and  similar  classes. 

Joshua  A.  Solomon  and  George  Sperling.  (1993).  The  lateral  inhibition  of  perceived  contrast  is 
indifferent  to  on-center/off-center  segregation  but  specific  to  orientation.  Vision  Research,  33, 
2611-2683. 
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Chubb,  Sperling,  and  Solomon  (1990)  showed  that  the  perceived  contrast  of  a  test  patch  of 
isotropic  spatial  texture  P  embedded  in  a  surrounding  texture  held  5,  depends  substantially  on  the 
contrast  of  the  texture  surround  S.  When  P  is  surrounded  by  a  high  contrast  texture  with  a 
similar  spatial  frequency  content,  it  appears  to  be  have  less  contrast  than  when  it  is  surrounded  by 
a  uniform  field.  This  paper  describes  two  novel  textures:  T*  which  is  designed  to  selectively 
stimulate  only  the  on-center  system,  and  T~,  the  off-center  system.  When  the  type  of  C  and  of  S 
is  chosen  to  be  or  T ,  the  reduction  of  C’s  apparent  contrast  does  not  vary  with  the 
combination  of  7"^,  T~.  This  demonstrates  that  the  reduction  of  C’s  apparent  contrast  is  mediated 
by  a  mechanism  whose  neural  locus  is  central  to  the  interaction  between  on-center  and  off-center 
visual  systems. 

The  induced  reduction  of  apparent  contrast  is  .ihown  to  be  orientation  specificity:  the 
reduction  of  grating  C’s  apparent  contrast  by  a  surround  grating  S,  of  the  same  spatial  frequency 
is  greatest  when  C  and  5  have  equal  orientation.  Using  dynamically  phase-shifting  sinusoidal 
gratings  of  3.3,  10  and  20  cpd,  the  reduction  of  apparent  contrast  was  measured  using  different 
contrast-combinations  of  C  and  5 . 

The  results:  (1)  Both  parallel  and  orthogonal  S  gratings  caused  suppression  of  P’s  apparent 
contrast  relative  to  a  uniform  surround.  (2)  In  ail  of  the  viewing  conditions,  the  reduction  of 
apparent  contrast  induced  by  the  parallel  surrounds  was  at  least  as  great  as  that  induced  by  the 
perpendicular  surrounds.  Often  it  was  much  greater  (orientation  specificity).  (3)  Orientation 
specificity  increased  with  greater  spatial  frequencies  and  with  lower  stimulus  contrasts.  The 
results  suggest  a  contrast  perception  mechanism  in  which  both  oriented  and  nonoriented  units 
determine  the  perceived  lightness  or  darkness  of  a  point  in  visual  space,  and  every  unit  is 
inhibited  primarily  by  similar  adjacent  units. 


3.  Information  Processing:  Frequency  Bands,  Subsampling,  Noise;  Space  and  Object  Perception 

This  cluster  of  projects  determined,  in  several  domains,  how  to  most  efficiently  package 
information  to  an  observer.  Obviously,  issues  of  external  representation  of  information  are 
inextricably  tied  to  the  question  of  "What  internal  representation  does  the  observer  use?"  Such 
investigations  may  lead  to  useful  formulations  of  how  to  improve  both  information  presentation 
and  observer  training.  The  basic  method  was  to  partition  the  total  stimulus  information  into 
several  spatial  frequency  bands,  and  to  determine  performance  individually  for  the  component 
bands.  Additionally,  Riedl  and  Sperling  studied  cross-band  masking  and  measure  how 
information  from  component  frequency  bands  combines  in  a  complex,  dynamic  visual  stimulus. 

The  "Three-stages  and  two  systems"  paper  in  this  sequences  proposes  a  theoretical  analysis 
of  the  basic  computations  of  visual  preprocessing.  It  shows  how  results  from  motion  and  texture 
discrimination  experiments  derive  from  the  same  mechanisms  that  serve  higher-order  object  object 
perception.  The  eye  movement  paper  in  this  sequence  deals  with  the  internal  representation  of 
scenes  that  derive  from  a  sequence  of  saccadic  eye  movements,  and  with  the  visual  mechanisms 
that  serve  the  saccadic  mode  of  information  acquisition. 
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Sperling,  Wurst  &  Lu  deal  with  a  new  method  of  discriminating  early  from  late  attentional 
filtering  of  features  that  occur  within  at  a  single  location  .  Their  paradigm,  which  was  applied  to 
repetition  detection  task,  is  easily  be  extended  to  visual  search,  and  this  forms  the  basis  of  the 
proposed  experiments. 


Riedl,  Thomas  R.  and  George  Sperling.  Spatial  frequency  bands  in  complex  visual  stimuli: 
American  Sign  Language  Journal  of  the  Optical  Society  of  America  A:  Optics  and  Image  Science, 
1988,  5.  606-616. 

This  project  examined  dynamic  images  of  individual  signs  of  American  Sign  Language 
(ASL)  with  a  resolution  of  96  x  64  pixels  which  were  bandpass  filtered  in  adjacent  frequency 
bands.  Intelligibility  was  determined  by  testing  deaf  subjects  fluent  in  ASL.  (a)  It  was  possible 
to  find  four  adjacent  bands  which  divided  the  signal  into  approximately  equally  intelligible  parts, 
any  one  of  which  yielded  adequate  identification  accuracy  (a)  By  iteratively  varying  the  center 
frequencies  and  bandwidths  of  the  spatial  bandpass  filters,  it  was  possible  to  divide  the  original 
signal  into  four  different  component  bands  of  high  intelligibility  (67-87%  for  isolated  ASL  signs), 
(b)  The  empirically  measured  temporal  frequency  spectrum  was  approximately  the  same  in  all 
bands,  (c)  The  masking  of  signals  in  band  i  by  noise  in  band  j  was  found  to  be  proportional  to 
the  frequency  similarity:  log  Kf  noise !  f signal  At  constant  performance, 

(RMS)j,gnai  /  {RMS)noise  was  the  same  for  bands  2,  3,  4  and  higher  for  band  1.  (d)  The  most 
effective  masking  noise  is  slightly  lower  in  spatial  frequency  than  stimulus  (A(0=1.4).  (e) 
Intelligibility  for  the  sum  of  two  very  weak  signals  is  greater  the  closer  they  are  in  spatial 
frequency;  for  strong  signals,  the  reverse  is  true.  The  dominant  factor  for  weak  sign^s  is 
square-law  additivity  of  signal  power;  for  strong  signals,  redundancy  within  a  band  is  the  limiting 
factor. 


Parish,  David  H.  and  George  Sperling.  Object  spatial  frequencies,  retinal  spatial  frequencies, 
noise,  and  the  efficiency  of  letter  discrimination.  Vision  Research,  1991,  31,  1399-1415. 

The  26  upper-case  letters  of  English  were  used  to  determine  which  spatial  frequencies  are 
most  effective  for  letter  identification,  and  whether  this  is  because  letters  are  objectively  more 
discriminable  in  these  frequency  bands  or  because  observers  can  utilize  the  information  more 
efficiently.  Six  two-octave  wide  filters  produced  spatially  filtered  letters  with  2D-mean 
frequencies  ranging  from  0.4  to  20  cycles  per  letter  height.  Subjects  attempted  to  spatially  filtered 
letters  in  the  presence  of  identically  filtered,  added  Gaussian  noise.  The  percent  of  correct  letter 
identifications  was  measured  as  a  function  of  s/n  in  each  band  at  each  of  four  viewing  distances 
ranging  over  32: 1 .  In  this  paradigm,  object  spatial  frequency  band  and  s  In  determine  presence  of 
information  in  the  stimulus;  viewing  distance  determines  retinal  spatial  frequency,  and  affects  only 
ability  to  utilize,  (a)  Viewing  distance  had  no  effect  upon  letter  discriminability:  object  spatial 
frequency,  not  retinal  spatial  frequency,  determined  discriminability.  (b)  With  the  assistance  of 
Charles  Chubb,  an  ideal  detector  was  computed  for  the  letter  identification  task.  For  these  two- 
octave  wide  bands,  s/n  performance  of  humans  and  of  the  ideal  detector  improved  with  frequency 
mainly  because  linear  bandwidth  increased  as  a  function  of  frequency,  (c)  Human  discrimination 
efficiency  (which  compares  human  discrimination  to  an  ideal  discriminator)  was  0  in  the  lowest 
frequency  bands,  reached  a  maximum  of  0.42  at  1.5  cycles  per  object,  and  dropped  to  about  .104 
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in  the  highest  band,  (d)  Upper  case  letter  information  is  best  extracted  from  spatial  frequencies  of 
1.5  cycles  per  object  height,  an  with  equal  high  efficiency  over  at  least  a  32:1  range  of  retinal 
frequencies  from  .074  to  more  than  2.3  cycles  per  degree  of  visual  angle. 


Parish,  David  H.,  George  Sperling,  and  Michael  S.  Landy.  Intelligent  temporal  subsampling  of 
American  Sign  Language  using  event  boundaries.  Journal  of  Experimental  Psychology:  Human 
Perception  and  Performance.  1990,  16.  282-294. 

This  paper  investigates  the  effects  of  temporal  stimulus  subsampling  and  the  form  of 
stimulus  representation  on  intelligibility  of  a  complex  visual  stimulus  (American  Sign  Language). 
How  well  can  a  sequence  of  ASL  frames  be  represented  by  a  subset  of  the  frames,  and  how  is  the 
subset  optimally  chosen?  Two  drastically  different  representations  of  frame  sequences  were 
investigated:  dynamic  (ordinary  video  viewing)  and  static  (component  frames  placed  side-by-side 
in  a  single  display).  Secondarily,  full  gray  scale  images  were  compared  with  binary  images 
(cartoons).  An  activity-imlex  was  used  to  select  critical  frames  at  event  boundaries-moments  in 
the  sequence  where  the  difference  between  successive  frames  has  a  local  minimum.  Identification 
accuracy  (intelligibility)  was  measured  for  32  experienced  ASL  signers  who  viewed  84  variously 
constructed  sequences  of  isolated  ASL  signs.  With  dynamic  sequences  that  utilized  full  gray¬ 
scale,  activity-index  subsampling  yielded  significantly  more-intelligible  sequences  than  simple 
repetition  of  every  /t-th  frame,  achieving  relative  compression  ratios  of  up  to  2:1.  For  static 
sequences,  activity  subsampling  with  a  small,  optimal  number  of  frames  achieved  higher 
intelligibility  than  was  achieved  by  choosing  every  n-th  frame,  for  any  n.  Binary  images  were 
less  intelligible  than  the  gray  scale  images,  and  the  relative  advantage  of  activity  subsampling  was 
smaller. 

(1)  Event  boundaries  can  be  defined  computationally.  Sequences  composed  of  frames 
chosen  from  event  boundaries  yielded  higher  intelligibility  than  sequences  composed  of  equal 
numbers  of  frames  spaced  at  regular  intervals.  (2)  Static  presentation  of  subsets  of  selected 
frames  can  yield  intelligible  ASL  "text"  of  isolated  signs  and  perhaps,  eventually,  of 
conversational  ASL. 


This  research  opens  the  general  question  of  how  to  use  printing  technology  in  place  of  video 
technology,  where  the  printing  technology  is  enhanced  at  the  point  of  production  by  computer 
graphics  techniques.  How  can  an  automatically  generated  sequence  of  images  best  be  used  ~  like 
a  comic  book  —  to  represent  a  dynamic  sequence  of  events.  When  an  artist  is  required  to 
represent  the  images  for  eventual  printing,  the  cost  can  be  prohibitive.  When  the  images  can  be 
automatically  generated  from  a  video  recording,  the  production  costs  are  minor.  The  ASL  study 
demonstrates  the  feasibility  of  representing  a  dynamic  ASL  sign  by  a  simultaneously  visible 
packet  of  images.  Research  is  needed  to  determine  how  these  results  might  be  generalized  to 
more  complex  communications  and  to  practical  training  problems  that  involve  dynamic  actions. 


Sperling,  George.  Three  stages  and  two  systems  of  visual  processing.  Spatial  Vision.  1989,  4. 
183-207. 
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This  paper  offers  a  theoretical  synthesis  of  classic  work  on  light  adaptation  and  on  visual 
thresholds  for  pattern  uimuli,  work  on  efficiency  of  identification  in  various  spatial  frequency 
bands,  and  work  op  motion  and  texture  perception,  in  terms  of  three  stages  and  two  systems  of 
visual  processing.  The  initial  question  is:  How  would  an  internal  noise  (at  various  levels  of 
perceptual  processing)  appear  to  external  observer?  This  is  determined  by  the  internal  location  of 
the  noise  relative  to  three  stages  of  visual  processing:  light  adaptation,  contrast  gain  control,  and  a 
postsensory/decision  stage.  Dark  noise  occurs  prior  to  adaptation,  determines  dark-adapted 
absolute  thresholds,  and  mimics  stationary  external  noise.  Sensory  noise  occurs  after  dark 
adaptation,  determines  contrast  thresholds  for  sine  gratings  and  similar  stimuli,  and  mimics 
external  noise  that  increases  with  mean  luminance.  Postsensory  noise  incorporates  perceptual, 
decision,  and  mnemonic  processes.  It  occurs  after  contrast-gain  control  and  mimics  external  noise 
that  increases  with  stimulus  contrast  (i.e.,  multiplicative  noise),  and  therefore  mimics  external 
multiplicative  noise.  Dark  noise  and  sensory  noise  are  frequency  specific  and  primarily  affect 
weak  signals.  Only  postsensory  noise  significantly  affects  the  discriminability  of  strong  signals 
masked  by  stimulus  noise;  postsensory  noise  has  constant  power  over  a  wide  spatial  frequency 
range  in  which  sensory  noise  varies  enormously.  Especially  in  dealing  with  modulation  transfer 
functions,  there  has  been  considerable  confusion  over  the  spectrum  of  internal  sensory  noise 
(which  unavoidably  depends  on  spatial  frequency)  with  the  gain  factor  of  sensory  transmission 
(which  ideally  would  be  independent  of  spatial  frequency). 

Two  parallel  perceptual  regimes  jointly  serve  human  object  recognition  and  motion 
perception:  a  first-order  linear  (Fourier)  regime  that  computes  relations  directly  from  stimulus 
luminance,  and  a  second-order  nonlinear  (nonFourier)  rectifying  regime  that  uses  the  absolute 
value  (or  power)  of  stimulus  contrast.  When  objects  or  movements  are  defined  by  high  spatial 
frequencies  (i.e.,  texture  carrier  frequencies  whose  wavelengths  are  small  compared  to  the  object 
size),  the  responses  of  high-frequency  receptors  are  demodulated  by  rectification  to  facilitate 
discrimination  at  the  higher  processing  levels.  Rectification  sacrifices  the  statistical  efficiency 
(noise  resistance)  of  the  first-order  regime  for  efficiency  of  connectivity  and  computation. 


Sperling,  George.  Comparison  of  perception  in  the  moving  and  stationary  eye.  In  E.  Kowler 
(Ed),  Eye  Movements  and  their  Role  in  Visual  and  Cognitive  Processes.  Amsterdam,  The 
Netherlands:  Elsevier  Biomedical  Press,  1990.  Pp.  307-351. 

This  paper  reports  the  construction  of  an  apparatus  for  producing  simulated  saccades- 
continuous  sequences  of  images  on  a  stationary  retina  that  are  equivalent  to  the  images  produced 
on  the  retina  during  saccadic  eye  movements.  Spatial  localization  was  studied  for  stimuli  flashed 
during  real  eye  movements  (using  a  limbus  monitor)  and  during  identical  image  sequences 
(simulated  saccades)  produced  on  a  stationary  retina.  The  comparison  between  real  and  simulated 
saccades  gives  critical  insights  into  those  mechanisms  that  are  particular  to  saccades.  The  paper 
reviews  the  historically  important  paradigms  (and  representative  experiments)  that  purport  to  deal 
with  special  modes  of  saccadic  processing.  On  the  basis  of  all  these  data,  it  proposes  a  theory  to 
account  for  saccadic  simulation  experiments  and  to  deal  with  such  questions  about  human  visual 
perception  as: 

Why  don’t  we  see  the  smear  produced  on  the  retina  during  an  eye  movement? 

Why  doesn’t  the  world  appear  to  move  as  a  result  of  the  image  movements  produced  by  eye 
movements? 
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Does  the  visual  system  require  sudden  stimulus  onsets  (such  as  those  produced  by  eye 
movements)  to  initiate  processing  episodes? 

To  serve  the  perceptual  construction  of  a  stable  representation  of  the  world,  is  there  a  special 
memory  to  relate  images  produced  by  successive  eye  movements? 


Sperling,  G.  Wurst,  S.  A.,  and  Lu,  Z-L.  (1993).  Using  repetition  de'^ction  to  define  and  localize 
the  processes  of  selective  attention.  In  D.  E.  Meyer  and  S.  Komblum  (Eds.),  Attention  and 
Performance  XIV:  Synergies  in  Experimental  Psychology,  Artificial  Intelligence,  and  Cognitive 
Neuroscience  -  A  Silver  Jubilee  Cambridge,  MA:  MIT  Press.  Pp.  265-298. 

Can  subjects  selectively  attend  to  a  subset  of  items  in  rapid  display  sequences,  when  the 
subset  is  characterized  by  an  obvious  physical  feature,  but  all  items  occur  in  the  same  location. 
The  paradigm  is  a  repetition  detection  task  in  which  subjects  search  a  very  rapidly  presented 
sequence  of  thirty  superimposed  frames  for  an  item  that  is  repeated  within  four  frames. 
Successful  detection  implies  that  a  match  occurs  between  an  incoming  item  and  a  recent  item 
retained  in  short-term  visual  repetition  memory  (STVRM).  Previous  results  (Kaufman,  1978, 
Wurst,  1989)  showed  that  detection  of  visual  repetitions  in  a  rapid  stream  of  items  is  indifferent  to 
eye  of  origin  and  to  interposed  masking  fields,  and  functions  as  well  for  nonsense  shapes  as  for 
digits.  Therefore,  STVRM  is  visual,  not  verbal  or  semantic.  It  is  governed  by  interference  from 
new  items;  it  does  not  suffer  passive  decay  within  the  short  interstimulus  intervals  under  which  it 
has  been  tested. 

This  paper  uses  a  novel  elaboration  of  a  repetition  detection  paradigm.  Within  the  stream, 
the  physical  features  of  the  successive  items  alternate  in  color,  size  or  spatial  frequency.  For 
example,  in  the  size  condition,  the  odd-numbered  items  in  the  stream  are  large  and  the  even- 
numbered  items  are  small.  Subjects  attend  selectively  to  small  (or  to  large)  items.  Using 
selective  attention  instructions  with  the  repetition  detection  task  permits  testing  the  extent  to 
which,  at  a  single  location,  subjects  can  filter  rapidly-successive  items  according  to  their  physical 
characteristics.  By  presenting  all  the  items  at  the  same  location,  only  attentional  selection 
according  to  features  (and  not  according  to  location)  is  effective.  Subjects  selectively  attended  to 
subsets  of  characters  based  on  physical  differences  of  orientation,  contrast  polarity,  color,  size, 
spatial  bandpass  filtering,  and  polarity-and-size  combined. 

Results.  Efficiency  of  attentional  selection  was  determined  by  comparing  performance  in  a 
stream  of  characters  that  alternated  a  physical  feature  with  performance  in  two  control  conditions: 
One  in  which  the  to-be-unattended  characters  were  optically  filtered  and  another  in  which  all 
characters  shared  the  same  physical  feature.  Selection  efficiency  in  bandpass  filtered  streams  and 
in  the  polarity-and-size  streams  was  greater  than  50  percent.  Attentional  selection  based  on  the 
other  physical  features  was  less  effective  or  ineffective. 

Corresponding  to  the  benefits  of  attentional  selection  in  detecting  to-be-attended  repetitions, 
there  were  large  costs  in  the  detection  of  unattended  features.  Costs  were  more  ubiquitous  than 
benefits. 

In  addition  to  studying  repetitions  of  items  that  shared  a  physical  feature  (homogeneous 
repetitions)  heterogeneous  repetitions  were  studied.  Costs  for  detecting  heterogeneous  repetitions 
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(relative  to  homogeneous  repetitions)  were  widespread,  indicating  that  physical  features  are 
represented  in  STVRM.  The  corresponding  stimulus  benefits  of  detecting  homogeneous 
repetitions  in  feature-alternating  streams  (under  equal  attention)  were  small  and  only  occasionally 
significant. 

If  the  state  of  attention  were  represented  in  STVRM,  we  would  expect  a  cost  in  the  detection 
of  heterogeneous  repetitions  with  selective  attention  instructions  (because  the  attentional  state 
would  differ  for  the  two  elements  of  the  pair).  Such  costs  were  observed  and,  in  some  instances 
they  occurred  even  when  there  was  no  corresponding  benefit  for  selective  attention  in 
homogeneous  detections.  This  was  interpreted  as  a  lack  of  early  attentional  filtering  compensated 
by  a  memory  tag  representing  whether  or  not  an  item  was  attended. 

Conclusion:  The  largest  attentional  effects  occur  at  the  level  of  attentional  selection  prior  to 
encoding  in  STVRM  (for  bandpass  and  polarity-and-size  stimuli)  but  that,  even  when  early 
attentional  filtering  fails,  it  can  still  occur  in  STVRM. 


4.  Visual  Attention  and  Short-Term  Memory 

Performance  in  many  visual  tasks  depends  not  only  on  characteristics  of  the  visual  system, 
but  also  on  more  cognitive  processes  involved  in  processing  visual  information,  such  as  attention 
and  memory.  The  experiments  seek  to  dissect  the  processes  involved  in  short-term  attentional 
control  and  the  corresponding  short-term  memory  systems.  The  experimental  methods  mostly 
involve  rapid  sequences  of  displays  because  our  past  work  has  shown  that  temporal  sequences  can 
be  used  to  sample  the  time  course  of  temporal  processing.  The  work  on  visual  persistence,  iconic 
memory,  and  related  phenomena  exemplifies  processing  in  the  absence  of  successive  events;  i.e., 
single-event  processing. 


Background 

The  attention  experiments  herein  and  many  prior  experiments  from  the  vast  literature  on 
visual  attention  are  encompassed  in  a  general  theoretical  framework.  The  starting  point  is  the  first 
published  demonstration  of  an  attentional  operating  characteristic  (Sperling  and  Melchner,  1976, 
1978a)  and  the  concept  of  attentional  resources  developed  by  Navon  and  Gopher  (1979),  Norman 
and  Bobrow  (1975),  and  others. 

Sperling,  G.  A  unified  theory  of  attention  and  signal  detection.  In  R.  Parasuraman  and  D.  R. 
Davies  (Eds.),  Varieties  of  Attention.  New  York,  N.  Y.:  Academic  Press,  1984.  Pp.  103-181.  A 
state  of  attention  is  characterized  by  a  particular  allocation  of  processing  and  mnemonic  resources, 
and  this  allocation  determines  the  joint  performance  on  two  (or  more)  competing  tasks.  The 
Attention  Operating  Characteristic  (AOC)  is  the  range  of  possible  joint  performances  as  resource 
allocation  is  varied  from  one  extreme  to  the  other.  This  paper  demonstrates  that  the  ACX:  is 
generated  by  a  process  that  is  mathematically  equivalent  to  the  process  that  generates  the  receiver 
operating  characteristic  (ROC)  of  signal  detection  theory  (i.e.,  the  process  partitions  observations 
into  either  signal  or  noise  response  categories). 
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This  article  also  proposes  a  formal  definition  of  a  task  as  a  triple  of  two  sets  (stimuli  and 
responses)  and  a  mapping  between  them  (a  utility  function).  The  task  definition  enabled  a 
distinction  between  compound  and  concurrent  tasks.  Concurrent  tasks  were  shown  to  be 
especially  useful  in  the  study  of  attention,  whereas  compound  tasks  involved  primarily  the  study 
of  decision  making,  and  resulted  in  considerable  difficulties  when  they  were  applied  to  attentioa 
The  utility  function  (in  the  task  definition)  is  essential  to  understanding  human  performance.  In 
contemporary,  formal  theory,  "utility"  plays  the  same  role  as  did  "purpose"  in  earlier,  informal 
accounts  of  behavior. 

Sperling,  G.,  and  B.  A.  Dosher.  (1986).  Strategy  and  optimization  in  human  information 
processing.  In  K.  Boff,  L.  Kaufman,  and  J.  Thomas  (Eds.),  Handbook  of  Perception  and 
Performance.  Vol.  1.  New  York,  NY:  Wiley,  1986.  Pp.  2-1  to  2-65. 

This  highly  condensed,  encyclopedic  treatment  of  a  large  literature  on  attention  and 
performance  is  equivalent  to  over  200  ordinary  book  pages  plus  more  than  100  figure  panels. 
Concepts  such  as  formal  task  definitions,  compound  and  concurrent  tasks,  attentional  resources, 
attentional  operating  characteristics,  and  more  generally,  strategies  to  optimize  performance,  are 
applied  to  the  interpretation  of  data  from  many  classical  paradigms.  This  yields  a  deeper 
understanding  and,  in  many  instances,  vastly  different  conclusions. 


Attentional  Trajectories. 

The  Wilson  Cloud  Chamber  and  Glaser  Bubble  Chamber,  which  are  designed  to  make 
visible  the  trajectories  of  individual  atomic  and  subatomic  particles,  work  by  populating  the 
volume  within  which  a  particle  will  move  with  steam  or  superheated  liquid.  When  a  target 
particle  moves  thru  the  chamber,  a  few  of  the  molecules  it  strikes  form  the  nucleus  of  condensing 
droplettes  or  evolving  bubbles,  and  the  visible  track  of  these  droplettes  or  bubbles  defines  the 
trajectory. 

Sperling  and  Reeves  (1980)  introduced  an  analogous  procedure  in  the  realm  of 
measurements  of  human  attention.  A  rapid  stream  of  superimposed  visual  items  was  presented  at 
rates  of  up  to  13  per  second  in  a  single  spatial  location.  Subjects  attended  a  second  location.  At 
a  critical  moment  during  the  sequence,  subjects  were  cued  to  execute  a  shift  of  attention  to  the 
stream  location,  and  to  report  the  earliest  four  of  the  items.  The  historgram  (distribution)  of  the 
actually  reported  items  (a  small  fraction  of  the  presented  items)  defined  the  rapid  growth  and 
subsequent  decline  of  attention  at  the  stream  location.  This  paradigm  made  it  possible  to  measure 
reaction  times  of  shifts  of  visual  attention.  Indeed,  the  paradigm  allows  the  measurement  not  only 
of  the  mean  reaction  time  of  an  attentional  shift  but  of  the  entire  density  function  of  attentional 
reaction  times  (ARTs).  Mean  ARTs  were  shown  to  be  quite  similar  to  motor  reaction  times 
(MRTs)  and  to  covary  with  MRTs  in  response  to  factors  such  as  task  difficulty  and  target 
predictability. 

Reeves,  A.,  and  G.  Sperling.  (1986)  Attention  gating  in  short-term  visual  memory.  Psychological 
Review.  93.  180-206. 
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This  paper  offers  a  computational  model  of  a  shift  of  visual  attention,  greatly  enlarging  on 
the  procedures  of  Sperling  &  Reeves  (1980).  An  attention  shift  takes  attention  from  its  initial 
location  n  to  a  second  location  b .  While  attention  is  focussed  at  a ,  stimulus  information  from  a 
is  admitted  to  further  processing,  and  stimulus  information  from  b  is  excluded.  After  the  shift, 
the  roles  of  a  and  b  are  reversed.  The  process  of  shifting  attention  to  b  is  conceptualized  as  the 
opening  of  an  attentional  gate  at  b.  In  Reeves  and  Sperling’s  (1980)  attentional  task,  location  b 
contains  a  rapid  stream  of  characters,  so  the  attention  gate  remains  open  at  b  only  for  a  for  a  brief 
period  to  avoid  flooding  memory  with  irrelevant  items. 

The  theory  assumes  that  the  fraction  of  stimulus  information  passed  on  to  higher  mental 
processes  from  a  location  in  space  and  a  moment  in  time  is  proportional  to  the  attentional 
allocation  at  that  location.  The  theory  contains  only  three  parameters:  First,  there  is  a  latency 
between  the  signal  to  shift  attention  and  the  start  of  the  attention  shift.  Second,  the  time  course  of 
gate  opening  is.  described  by  a  second-order  gamma  function  with  a  time  constant,  typically,  of 
several  hundred  msec.  Third,  there  is  the  amplitude  of  internal  noise  that  determines  the  signal- 
to-noise  ratio  of  the  internally  represented  information. 

The  data  set  is  quite  complex,  and  the  theory  makes  accurate  predictions  of  literally 
hundreds  of  data  points  with  these  few  parameters. 

Sperling,  George.  The  magical  number  seven:  Information  processing  then  and  now.  In  William 
Hirst  (Ed.),  The  making  of  cognitive  science:  Essays  in  honor  of  George  A.  Miller.  Cambridge, 
UK:  Cambridge  University  Press,  1988. 

This  article  analyzes  why  the  magical  number  7  +-2  had  such  a  major  impact  on  cognitive 
science  —it  is  the  most  cited  experimental/theoretical  article  in  Psychology.  The  article  7+-2 
offers  a  theoretical  account  of  absolute  Judgment  (sensory  categorization)  experiments  and  of 
short-term  memory  experiments.  Both  kinds  of  experiments  have  a  limit  of  7  (bits,  and  items, 
respectively).  There  are  no  self-citations  in  the  references.  All  of  the  evidence  Miller  used  was 
publically  available.  Miller,  like  Sherlock  Holmes,  was  the  one  who  was  able  to  formulate  a 
theory  to  encompass  these  data,  and  it  was  perhaps  the  first  plausible  quantitative  theory  to  deal 
with  the  microprocess  of  cognition. 

The  second  part  of  the  analysis  deals  with  the  current  status  of  Miller’s  proposals.  Miller’s 
seven-item  limit  turns  out  to  depend  on  factors  such  as  acoustic  confusability,  implying  that  the 
item  limit  is  based  on  a  sensory-based  acoustic  memory  rather  than  an  abstract  memory.  The 
review  then  points  out  that  a  single  memory  system— a  stack  of  seven  items— can  encompass  both 
the  bit  and  the  item  limits  Miller  had  proposed.  In  a  sensory  categorization  experiment,  the  seven 
items  in  working  memory  are  items  with-respect-to-which  new  items  are  judg^.  In  a  short-term 
recall  experiment,  they  are  the  to-be-recalled  items.  Such  a  stack  memory  is  easily  embodied  in  a 
neural  network.  Thus,  a  simple  neural  network  memory  model  can  encompass  the  two  main 
tenets  of  Miller’s  magical  number  seven. 


Weichselgartner,  E.,  and  George  Sperling.  (1987)  Dynamics  of  automatic  and  controlled  visual 
attention.  Science.  238,  778-780. 
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Uses  the  Sperling  &  Reeves  (1980)  paradigm  to  isolate  and  measure  the  partially  concurrent 
time  courses  of  automatic  and  controlled  attentional  shift.  The  automatic  component  is  extremely 
rapid,  very  brief  in  duration,  and  relatively  effortless.  The  controlled  component  has  the  same 
time  course  as  the  previously  measured  attention  shifts  (Sperling  &  Reeves,  1980;  Reeves  & 
Sperling,  1986),  is  slower,  has  a  longer  duration,  and  is  effortful. 

Sperling,  George,  and  Weichselgartner,  Erich.  (199x).  Episodic  theory  of  the  dynamics  of  spatial 
attention.  Psychological  Review.  (Under  revision.) 

This  paper  re-analyzes  previous  measurements  of  visual  attention  in  simple  reaction-time, 
choice  reaction-time  and  complex  discrimination  experiments  in  which  attention  was  purported  to 
move  continuously  across  space.  All  these  data  plus  data  from  attention  gating  experiments  were 
shown  to  be  quantitatively  predicted  by  a  quanta!  (episodic)  theory  of  spatial  attention  that 
proposes  instead:  (a)  visual  attention  can  be  resolved  into  a  sequence  of  discrete  attentional  acts 
(episodes);  (b)  each  attentional  episode  is  defined  by  its  spatial  facilitation  function  / {x,y)\  (c)  the 
transition  at  time  Iq  between  episodes  is  described  by  a  temporal  alerting/gating  function  G(t  -to); 
(d)  /  and  G  are  space-time  separable.  In  support  of  the  theory,  new  experiments  are  reported 
that  use  a  concurrent  motor  reaction-time  task  to  assess  changes  in  discriminability  with  distance. 
When  non-attentional  factors  are  corrected  for,  the  duration  of  an  attention  shift  is  independent  of 
the  spatial  distance  traversed  and  of  the  presence  or  absence  of  interposed  visual  obstacles.  New 
experiments  that  test  and  confirm  the  theory  are  reported. 


Gegenfurtner,  K.  and  Sperling,  G.  (1993).  Information  transfer  in  iconic  memory  experiments. 
Journal  of  Experimental  Psychology:  Human  Perception  and  Performance,  1993,  19,  845-866. 

This  paper  investigates  the  role  of  selective  and  nonselective  transfer  processes  in  partial 
reports  of  information  from  briefly  exposed  letter  arrays.  In  order  to  report  letters,  viewers  must 
transfer  information  from  a  rapidly  decaying  persistence  trace  (iconic  memory)  to  a  more  durable 
short  term  memory.  At  some  time  following  termination  of  the  display,  subjects  are  cued  to 
report  a  particular  row  of  letters.  Transfer  that  occurs  prior  to  the  cue  is  nonselective;  transfer  that 
occurs  after  the  cue  is  selective,  (a)  Performance  is  unaffected  by  10:1  variations  in  the 
probabilities  of  short  and  long  cue  ddays.  This  implies  that  viewers  use  the  same  transfer 
strategies  at  all  cue  delays,  (b)  Information  transfer  that  has  occurred  at  various  times  t  before 
and  after  the  cue  is  measured  by  using  a  post-stimulus  mask  at  time  t  to  eliminate  visual 
persistence.  Nonselective  and  selective  information  transfer  (before  and  after  the  cue)  are  shown 
to  combine  additively.  (c)  Positions  within  rows  differ  substantially  in  their  accuracy  of  report. 

A  simple  model  accounts  for  partial  report  (cued)  performance  at  different  cue  delays  both 
with  and  without  a  mask,  and  for  whole  report  (uncued)  performance.  (1)  The  time  course  of 
iconic  legibility  after  stimulus  termination  depends  on  the  retinal  location  (row).  (2)  Initial 
attention  is  directed  to  the  middle  row,  subsequently  it  switches  to  the  cue-designated  row.  (3) 
The  instantaneous  location-sp>ecific  legibility  times  the  instantaneous  state  of  attention,  integrated 
over  time,  determines  cumulative  transfer,  subject  to  the  capacity  limit  of  durable  storage.  A 
review  of  earlier  computational  approaches  shows  that  only  this  model  is  capable  of  giving  a  self- 
consistent  account  of  information  transfer  from  iconic  memory. 
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1991  Sperling,  G.  and  Wurst,  S.  A.  (1991).  Selective  attention  to  an  item  is  stored  as  a  feature  of 
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Research  in  Vision  and  Ophthalmology,  Sarasota,  Florida,  May  1,  1990.  Further  measurements 
of  the  spatial  frequency  seleaivity  of  second-order  texture  meachanisms. 

1991  tGeorge  Spei^ling,  Neural  Networks  for  Vision  and  Image  Processing.  An  International  Confer¬ 
ence  Sponsored  by  BOston  University’s  Want  Institute,  Center  for  Adaptive  Systems,  Tyngs- 
boro,  MA  01879,  May  11,  1991.  Two  Systems  of  Visual  Processing. 

1991  tGeorge  Sperling,  Neural  and  Visual  Computation  Symposium  Center  for  Neural  Sciences  New 
York  University,  NY,  May  31,  1991.  The  Spatial.  Temporal,  and  Featural  Mechanisms  of 
Visual  Attention. 

1991  tGeorge  Sperling,  National  Academy  of  Sciences,  National  Research  Council,  Committee  on 
Vision,  Conference  of  Visual  Factors  in  Electronic  Image  Communications,  Woods  Hole,  MA, 
July  23,  1991.  Empirical  Observations  on  Image  Compression  and  Comprehension. 

1991  tGeorge  Sperling,  The  International  Society  for  Psychophysics,  Washington  Duke  Inn,  Duke 
University,  Durham,  North  Carolina,  New  York  University,  NY  October  19,  1991.  The 
Featural  Mechanism  of  Visual  Attention. 

1991  f'Chubb,  C.,  Solomon,  J.  A.  and  Sperling,  G.  Invited  paper  presented  by  Charles  Chubb.  Opt¬ 
ical  Society  of  America,  San  Jose,  California  November  7,  1991,  Contrast  Contrast  Determines 
Perceived  Contrast. 

1991  *George  Sperling  and  Stephen  Wurst,  Paper  presented  by  George  Sperling.  Psychonomic 
Society,  San  Francisco,  California  November  22,  1991.  Selective  Attention  to  an  Item  is  Stored 
as  a  Feature  of  the  Item. 

1992  *Shui-I  Shih  and  George  Sperling.  Eastern  Psychological  Association,  Boston,  Massachusetts, 
April  4,  1992.  Cluster  Analysis  as  a  Tool  to  Discover  Covert  Strategy. 

1992  *Werkhoven,  P.,  Sperling,  G.,  and  Chubb,  C.  Association  for  Research  in  Vision  and  Ophthal¬ 
mology,  Sarasota,  Horida,  May  6,  1992.  The  Dimensionality  of  Motion  From  Fexture. 

1992  *Werkhoven,  W.,  Sperling,  G.,  and  Chubb,  C.  Optical  Society  of  America,  Albuquerque,  New 
Mexico,  September  25,  1992.  Energy  Computations  in  Motion  and  Texture. 
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1992  ’George  Sperling  and  Hai-Jung  Wu,  Paper  presented  by  George  Sperling.  Psychonomic 
Society,  Saint  Louis,  Missouri,  November  15,  1992.  Defining  and  Teaching  Objectively  Accu¬ 
rate  Confidence  Judgments. 

1993  ’tGeorge  Sperling,  Linking  Psychophysics,  Neort^hysiology,  and  Computational  Vision:  A 
Conference  to  Celebrate  Bela  Julesz*  6Sth  Birthday.  Rutgers  University,  New  Brunswick,  NJ. 
May  1,  1993.  Spatial,  Temporal,  and  Featural  Mechanisms  of  Visual  Attention. 

1993  ’Solomon,  J.  A.  and  Sperling,  G.  Talk  presented  by  Joshua  A.  Solomon.  Association  for 
Research  in  Vision  and  Ophthalmology.  Sarasota,  Florida,  May  4,  1993.  Fultwave  and 
Halfwave  Rectification  in  Motion  Perception. 

1993  ’Shih,  Shui-I  and  Sperling,  G.  Talk  presented  by  Shui-1  Shih.  Association  for  Research  in 
Vision  and  Ophthalmology,  Sarasota,  Florida,  May  6,  1993.  Visual  Search,  Visual  Attention, 
and  Feature-Based  Stimulus  Selection. 

1993  ’Lu,  Zhong-Lin  and  Sperling,  G.  (1993)  Talk  presented  by  Zhong-Lin  Lu.  Association  for 
Research  in  Vision  and  Ophthalmology,  Sarasota,  Florida,  May  6,  1993.  2nd-Order  Illusions: 
Mach  bands,  Craik — O'Brien — Comsweet. 

1993  ’Chubb,  C.,  Darcy,  J.  and  Sperling,  G.  Talk  presented  by  Charles  Chubb.  Association  for 
Research  in  Vision  and  Ophthalmology,  Sarasota,  Florida,  May  6,  1993.  Metameric  Matches  in 
the  Space  of  Textures  Comprised  of  Small  Squares  with  Jointly  Independent  Intensities. 

1993  ’tSperling,  George  and  Dosher,  Barbara  A.  Talk  presented  by  George  Sperling.  Linking 
Psychophysics,  Neurophysiology  and  Computational  Vision.  A  Conference  to  Celebrate  Bela 
Julesz’  65th  Birthday.  Rutgers  University,  New  Brunswick,  New  Jersey,  May  1,  1993. 
Structure-from-motion:  Algorithms,  Illusions,  Mechanisms. 

1993  tSperling,  George.  Geometric  Representation  of  Perceptual  Phenomena.  A  Conference  in 
Honor  of  Tarow  Indow.  University  of  California,  Irvine.  July  28,  1993.  The  Representation 
of  Motion  and  Texture. 

1993  tSperling,  George.  Society  for  Mathematical  Psychology,  Twenty-Sixth  Annual  Meeting,  Nor¬ 
man,  Oklahoma.  Plenary  lecture.  August  17,  1993.  Second-Order  Perception. 

1993  ’tSperling,  George.  Ciba  Foundation  Symposium  No:  184.  Higher-Order  Processing  in  the 
Visual  System.  The  Ciba  Foundation,  41  Portland  Place,  London,  UK.  October  21,  1993. 
Full-Wave  and  Half-Wave  Mechanisms  in  Motion  and  Texture  Perception. 

1993  tSperling,  George.  International  Workshop  on  Digital  Video  for  Intelligent  Systems.  Hosted 
by  Department  of  Electrical  and  Computer  Engineering,  University  of  California,  Irvine,  Cali¬ 
fornia.  December  17,  1993.  An  engineering  model  of  human  visual  processinglintelligibility  of 
extremely  re  duced  images. 
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George  Sperling:  Invited  Lectures  at  Universities  and  Institutes 

1991  Department  of  Psychology  Colloquium,  University  of  California,  Irvine,  Irvine,  CA,  January  10, 
1991.  Visual  Preprocessing. 

1991  Department  of  Psychology  University  of  California  at  San  Diego,  La  Jolla,  CA,  February  28, 
1991.  Mechanisms  of  Attention. 

1991  University  of  California,  Berkeley  Berkeley,  California,  Joint  Cognitive  Science  Colloquium 
and  Oxyopia  Colloquium  (Optometry  School).  March  22,  1991.  Visual  Preprocessing. 

1991  University  of  California,  Berkeley  Berkeley.  California,  Department  of  Psychology/Cognitive 
Science  Colloquium,  March  22,  1991.  The  Spatial,  Temporal,  and  Featural  Mechanisms  of 
Visual  Attention. 

1991  Bonny  Center  for  the  Neurobiology  of  Learning  and  Memory,  University  of  California,  Irvine, 
Irvine,  CA,  April  8,  1991.  Mechanisms  of  Visual  Attention. 

1991  Salk  Institute,  University  of  California  at  San  Diego,  La  Jolla,  CA,  April  10,  1991.  Visual 
Preprocessing. 

1991  Department  of  Psychology,  University  of  Florida  at  Gainsville,  April  26,  1991.  Systems  and 
Stages  of  Visual  Processing. 

1991  Shanghai  Institute  of  Technical  Physics,  Shangahi,  China,  June  17,  1991.  How  the  Human 
Visual  System  Computes  Visual  Motion  [Host:  Prof.  Kuang,  Ding  Bo  (Director,  SITP);  Transla¬ 
tors:  Dr.  Zhang,  Ming  and  Chen,  Lulin.] 

1991  Department  of  Computer  Science,  Shanghai  Information-Technology  Engineers  Examination 
Center,  Fudan  University,  Shangahi,  China,  June  18,  1991.  Neural  Principles  of  Preprocessing 
for  Human  Pattern  Recognition.  [Host:  Prof.  Wu,  Lide  (Director,  SITEEC).] 

1991  Department  of  Electronic  Science  and  Technology,  Institute  of  Applied  Electronics,  East  China 
Normal  University,  Shangahi,  China,  June  20,  1991.  Measuring  Attention  and  How  the  Human 
Visual  System  Computes  Visual  Motion  [Host:  Prof.  Weng,  Moying  (Chairman  and  Director); 
Translator:  Dr.  Zhang,  Ming.] 

1991  Department  of  Psychology,  Beijing  University,  and  Institute  of  Psychology,  Chinese  Academy 
of  Sciences,  Beijing,  China,  June  23,  1991.  [Host:  Prof.  Jing,  Qicheng  (Director,  Institute  of 
Psychology)] 

Morning:  The  Efficiency  of  Pereception  [Translators:  Dr.  Zhang,  Ken  and  Prof.  Jing, 
Qicheng.] 

Afternoon:  Measuring  Attention.  [Translator  Luo,  Chun-Rong.] 

1991  Computational  Vision  Laboratory.  Institute  of  Biophysics,  Chinese  Academy  of  Sciences,  Beij¬ 
ing,  China,  June  28,  1991.  First-  and  Second-Order  Motion  Perception.  [Host:  Prof.  Wang 
Shuo-Rong  (Director,  Institute  of  Biophysics);  Translator  Prof.  Wang,  Yun-Jiu  (Laboratory 
Director.] 

1991  New  York  University,  Cognitive  Sciences  Colloquium,  September  12,  1991.  Is  There  Atten- 
tional  Filtering  of  Items  by  Feature  as  Well  as  by  Location? 

1992  Center  for  Adaptive  Systems  Boston  University,  February  25,  1992.  Is  There  Attentional  Selec¬ 
tion  of  Items  by  Feature  as  Well  as  by  Location? 

1992  University  of  Delaware,  Department  of  Psychology  Colloquium,  March  4,  1992.  Can  Visual 
Attentional  Filter  Items  by  Feature? 
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1993  University  of  California,  Irvine,  Department  of  Cognitive  Sciences,  Vision  Lunch  Series,  Janu¬ 
ary  13,  1993.  2nd-Order  Motion  Perception. 

1993  University  of  California,  Irvine,  Bren  Fellows  Program,  Learned  Societies  Luncheon,  UCI 
University  Club,  March  9,  1993.  Modeling  Mental  Microprocesses. 

1993  University  of  California,  Santa  Barbara,  First  Annual  Gottsdanker  Memorial  Lecture  (Depart¬ 
ment  of  Psychology).  May  27,  1993.  A  Theory  of  Spatial  Attention. 

1993  Kenneth  Craik  Club,  University  of  Cambridge,  Cambridge,  England,  October  25,  1993.  Early 
Visual  Processing. 

1993  University  of  California,  Berkeley.  December  3,  1993.  A  Theory  of  Spatial  Attention. 


George  Sperling.  Spatial,  temporal,  and  featural  mechanisms  of  visual 
attention.  Spatial  Vision,  1993, 7,  86. 


Spatial,  temporal,  and  featural  mechanisms  of  visual 
attention 

GEORGE  SPERLING 

Deparimeni  of  Cognitive  Sdences,  University  of  California.  Irvine  CA.  USA 

Spatial  selective  aaention  is  determined  by  an  instruction  to  attend  to  a  location  (or  set  of  locations) 
X.  and  temporal  attention  is  detemiined  by  an  instruction  to  attend  during  an  interval  T.  Attentional 
dynamics  are  studied  by  instructions  to  attend  first  to  XI  and  then  to  X2.  To  measure  these  fornis  of 
attention,  x.y.t  space  is  populated  with  items,  and  the  x.y.t  coordinates  of  all  the 
anentionaliy-processed  items  are  determined.  Results  indicate  attention  can  be  represented  as  a 
sequence  of  partially-overlapping  space-time  separable  episodes:  attention  shifts  are  discrete,  not 
continuous.  Each  attentional  episode  i  is  characterized  by  a  particular  spatial  distribution  of  attention 
f(i:  x)  and  a  temporal  period  g(i;  t)  during  which  f(i;  x)  is  effective.  The  theory  applies  to  many 
attentional  tasks:  goAio-go  reacdon  times,  choice  reaction  times,  accuracy  in  cued  search,  attentional 
gating  paradigms,  and  partial  report.  Attention  to  features  (versus  attention  to  a  location  in  space) 
is  studied  by  presenting  a  rapid  sequence  of  items  containing  different  features  at  a  single  location  and 
requiring  attention  only  to  items  that  contain  a  particular  feature.  Featural  attention  occurs  both  early 
(items  with  unattended  feanires  are  selectively  excluded  from  memory)  and  late  (attention  selectively 
affects  retrieval). 


Shui-I  Shih  and  George  Sperling,  Visual  Search,  Visual  Attention,  an 
Feature-Based  Stimulus  Selection.  Investigtive  Opthalmology  an 
Visual  Science,  1993,  Pg.l288,  No.2885. 


VISUAL  SEARCH,  VISUAL  ATTENTION,  AND  FEATURE-BASED  STIMULUS 
SELECTION.  ShuNShih'  and  George  SperlingK 

'Saint  Anselm  College,  Manchester,  NH  and  'University  of  California,  Irviite,  CA. 

In  rapid  visual  search,  does  selective  attention  to  the  value  of  a  particular  physical 
feature  permit  early  selection  of  items  with  that  feature  value  (e  g.,  red)  while  items  with  a 
different  value  (e.g.,  green)  are  rejected?  Or.  does  a  physical  feature  only  direct  attention  to 
a  location,  with  subsequent  selection  being  based  on  locational  selection  or  on  complex  de¬ 
cision  processes?  To  disentangle  the  effects  of  feature  selection  and  location  selection,  we 
use  a  search  paradigm  that  combines  aitentionol  cuing  and  RSVP  (rapid  serial  visual 
presentation).  Two  dimensions,  size  (small/large)  and  color  (ied.'gicen).  were  studied. 

Selective  attention  to  different  features  was  jointly  manipulated  by  instructions,  presen¬ 
tation  probabilities,  and  payoffs.  On  each  trial,  a  visual  cue  indicated  the  to-be-attended 
color  or  size.  The  subject  then  searched  a  rapid  sequence  of  character  amys  (6  letters  ar¬ 
ranged  in  a  circle  around  fixation)  for  a  single  unknown  digit  among  the  letter  distractors. 
The  feature-cue  indicated  the  probability  of  that  feature  value  in  the  target  digit  (e.g.,  50%. 
80%.  100%  red).  The  noncued  dimension  (e.g.,  size)  was  neutral;  the  target's  spatial  loca¬ 
tion  and  identity  were  chosen  randomly  and  independently.  The  task  was  to  identify  and 
localize  the  digit  and  to  identify  its  feature  value  in  the  cued  dimensioa 

In  Expt  1,  all  items  in  an  array  had  the  same  feature  value,  and  successive  arrays  alter¬ 
nated  in  feature  value  (e.g,  red/green).  Subjects  were  not  mote  successful  in  detecting 
attended-feature  targets  than  unattended  targets.  In  Expt  2,  successive  arrays  also  alternat¬ 
ed  features  but  the  target  and  exactly  one  item  in  every  other  array  h^  a  unique  feature 
value.  Now,  subjects  benefiaed  from  reliable  attentional  cues.  Thus,  anentional  cues  to  a 
physical  feature  were  useful  only  when  they  served  to  direct  attention  to  a  spatial  location, 
not  otherwise. 

Conclusion:  In  RSVP  visual  search,  spatial  location  is  the  means  by  which  feature- 
based  stimulus  selection  is  accomplished. 

Supported  by  ONR  Pctoepluil  Scienect  Pto|rini.  Grinl  N00014-SS-K-0569  nd  by  APOSR  Life  Science*. 
Visutl  Infomulion  Pioccuini  Pmptm,  Cnnt9l-017S. 
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J.A.  Solomon  and  G.  Sperling.  Fullwave  and  Halfwave  Rectification  In 
Motion  Perception.  Investigtive  Opthalmology  and  Visual  Science, 
1993,Pg.976,  No.1347. 


FULLWAVE  AND  HALFWAVE  RECTIFICATION  IN  MOTION 
PERCEPTION.  J.A.  Solomon*  and  G.  Speriing* 

‘SyracuM  University,  NY;  ^University  of  California,  Irvine. 

Microbaianced  stimuli  art  dynamic  displays  that  do  not  thmui«ig  ist-order  modon 
mechanisms-  mechanisms  that  apply  standard  (Fourier  energy  or  auiocotrelahonall 
motion  analysis  directly  to  the  visual  signal.  Purpose.  To  characterize  the  perceptual 
transfonnadons  of  the  visual  signal,  prior  to  standard  tnodon  analysis,  that  expose  the 
modon  of  microbalanced  stimuli.  Methods.  Two  kinds  microbelanced  stimuli  are  tested: 
(1)  halfwave  stimuli,  for  which  modon  infonnadon  is  exposed  by  halfwave  rectification 
but  lost  following  fullwave  lecdficadon  and  (2)  fullwave  stimuli,  for  which  motion 
infonnadon  is  exposed  by  fullwave  recdficadon  but  lost  following  halfwave  tecdfica* 
don.  Addidonally,  an  ordinary  squarewave  luminance  gradng  was  used  to  sdmulaie  1st* 
order  mechansims  (the  Fourier  stimulus).  Results.  Given  sufficient  contrast,  both 
fullwave  and  halfwave  stimuli  convey  modon.  All  observers  perceive  fullwave  modon; 
only  1/3  can  perceive  halfwave  modon.  Remarkably,  fullwave  stimuli  are  perceived  with 
slightly  greater  quanmm  efficiency  than  Fourier  stimuli,  and  much  mmc  efficiently  than 
halfwave  stimuli.  Tests  of  tnodon  transparency  reveal  that  when  either  fullwave  and 
Fourier  or  halfwave  and  Fourier  gradngs  are  briefly  presented  simultaneously,  there  is  a 
wide  range  of  reladve  contrasts  over  which  the  direcdons  of  both  gradngs  can  be 
accurately  perceived.  Q>nversely,  when  halfwave  and  fullwave  gradngs  are  added,  both 
mouons  are  perceived,  but  subjects  catmot  tell  which  is  which,  ronclusions.  Modon 
transparency  between  Fourier  and  microbalanced  stimuli  implies  two  parallel  modon 
systems.  Subjects'  failures  to  discriminate  halfwave  fiom  fullwave  modon  in  the 
transparency  task  suggest  that  the  halfwave  system,  for  those  who  posses  it,  is  not  labeled 
differently  from  the  fullwave  system. 

Supported  by  AFOSR  Visual  Infonnadon  Processing  Program,  Grant  91-0178. 


Zhong-Lin,  Lu  and  George  Sperling.  2nd-Order  Illusions:  Mach  Bands, 
Craik-O’Brien-Cornsweet.  Invesligtive  Opthalmology  and  Visual  Sci¬ 
ence,  1993,  Pgl289,  No.2889. 


21KI-ORDER  ILLUSIONS:  MACH  BANDS,  CRAIK-O’BRIEN-CORNSWEET 
Sung-Un  Lu  and  George  Sperling,  Univenity  of  California.  Irvine.  CA  92717. 

Previously.  Chubb.  Sperling  A  Solomon*  demonstrated  reduction  of  the  perceived  contrast 
of  a  texmred  test  patch  when  surrounded  by  a  textured  area  of  a  similar  spatial  fre(]uency 
delivered  to  the  same  eye.  Here,  we  demorrstrate  manifestations  of  lateral  textural  interac- 
dons  in  analogs  of  luminance  illusions  in  fUHwave  random  textures.  (A  fullwave  random 
texture  has  constant  mean  luminance  but  becomes  equivalent  to  a  luminance  pattern  upon 
fullwave  recdficatioa)  Mach  bands  ate  demonstrated  in  two  htllwave  textures.  First,  a  ran¬ 
dom  texture  In  which  the  contrast  of  each  pixel  is  chosen  randomly  and  independently  to  be 
either  *c  or  <;  second,  a  texture  oonstnicted  out  of  'Mexican  hats'  (center-sunound)  mi- 
oopatlems  that  ate  tatalomly  center-Ii^t  (-K)  or  center-dark  (-c).  The  Mach  battd  illusion 
is  prtxhioed  by  making  c(x)  a  ramp  function  of  x  tlua  varies  from  c  =  0.20  to  0.80 
An  induced  band  of  low  contrast  is  perceived  at  the  bottom  of  the  tamp,  and  a  band  of  high 
contrast  near  the  top  of  the  ramp  — /  .  These  subjective  impressions  were  quantified  by 

using  an  interleaved  staircase  procedure  to  compare  the  contrast  of  a  venical  slice  of  the 
Mach  band  pattern  to  an  adjaoetr.  texture  bar  that  varied  in  contrast  from  trial-to-trial.  In 
control  sessions,  luminance  Mach  bands  were  measured  similarly  The  magnitude  of  the 
perceptual  Mach  bands  was  similar  for  the  luminance  stimulus  and  for  the  two  fullwave 
textures.  Additionally,  a  fullwave  texture  stimulus  and  a  luminatKe  stimulus  exhibit 

Craik  O'Brien-Comsweet  illusions  of  similar  magnitudes  (the  stimulus  — t - 1 —  looks 

like  — 1  r— ).  However,  none  of  these  interactions  could  be  demonstiated  for  halfwave 
stimuli,  i.e.,  stimuli  that  become  luminance  stimuli  after  halfwave  rectification  but  are  neu¬ 
tral  to  Fourier  and  to  fullwave  analyses.  Together,  these  results  indicate  that  the  perceptual 
processes  governing  2txl-otder  spatial  InteractioRS,  like  those  governing  2nd-order  motion 
perception,  refiect  primarily  fullwave  (versus  halfwave)  rectificatiofL  As  in  2nd-order  rao- 
tim  perception.  Tnd-onkr  spatial  processing  (after  fullwave  rectification  of  the  stimulus)  is 
remarkably  similar  to  first-order  luminanoe  processing. 

'ChoMx  C .  Spofinf.  0„  *  SekMim.  t.  A.  (1989).  Prac.  NatAeaiSci.  USA  96. 9631-9635. 

St^portid  by  APOSR  Lift  SrinioM.  Vaod  InTonrittiofi  Preoaoint  Propam,  Ormi  9t-Ot7S 


) 


The  Dimensionality  of  Motion-from-Texture 

Peter  Werkhoven}  George  Sperling}  and  Charles  Chubb} 

'Human  Information  Processing  Laboratory,  New  Yoilc  University,  NY  NY  10003,  and 
^Psychology  Department,  Rutgers  University,  New  Brunswick,  NJ  08903. 

Texture-defined  motion  is  2nd-oider  apparent  motion  produced  by  consecutive 
patches  of  texture  that  ^re  constructed  to  have  no  useful  Fourier  motion  com¬ 
ponents.^  A  general  model  of  the  perception  of  texture-defined  motion  proposes 
multiple  independent  nonlinear  transformations  of  the  optical  input  (channels),  each 
channel  being  followed  by  standard  motion  analysis.  Here  we  present  an  experi¬ 
mental  paradigm  and  a  theoretical  analysis  to  determine  the  dimensionality  of  (i.e., 
the  number  of  channels  used  in)  the  motion-from-texture  computation  and  their 
sensitivity  to  spatial  frequency,  orientation  and  contrast. 

Each  display  contains  two  competing  apparent  motion  paths.  Each  path  consists 
of  alternating  patches  of  different  types  of  texture.  The  mean  luminance  of  all  tex- 
mres  is  equal  to  the  background  luminance,  and  their  phases  are  randomized  so  that 
motion  detection  cannot  be  based  on  a  direct  correlation  of  the  luminance  patterns 
(Ist-order  motion).  We  demonstrate  heterogeneous  motion  paths  in  which  consecu¬ 
tive  textures  differ  by  two  octaves  in  spatial  frequency,  by  a  factor  of  two  in  con¬ 
trast,  and  have  peipendicular  orientations,  and  neve^eless  these  heterogeneous 
paths  may  have  ^e  same  or  greater  strength  of  apparent  motion  than  homogeneous 
motion  paths  that  consist  entirely  of  either  type  of  texture  alone.  Such  striking 
counterintuitive  results  obtain  for  broad  ranges  of  ^atial  and  temporal  fiequencies. 
These  and  similar  results,  together  with  our  previous  results  (ARVO  1991),  define  a 
unique  single-channel  computation  for  the  detection  of  texture-defined  motion. 
That  is,  all  these  sinusoidal  texture-defined  motion  stimuli  are  preprocessed  by  a 
single  nonlinear  transformation  (a  broadly  tuned  texture  grabber  with  a  preference 
for  low  spatial  frequencies)  followed  by  standard  motion  analysis. 

Chubb,  C.  &  Sperling,  G.  (1991).  Texture  quilts;  Basic  tools  for  studying  motion-from-texture.  J. 
Math.  Psychol.  35.  *Supported  by  AFOSR  Visual  Information  Processing  Program,  Grant  91-0178. 
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CLUSTER  ANALYSIS  AS  A  TOOL  TO  DISCOVER  COVERT 
STRATEGIES. 

Shui-I  Shih  and  George  Speriing.  New  Yoik  University. 

Cluster  analysis  is  proposed  as  a  tool  to  discover  whether  and 
how  performance  in  a  series  of  sessions  (or  trials)  depends  on 
the  effect  of  covert  strategies.  We  illustrate  its  value  by  tqsply- 
ing  it  to  a  large  short-term  memory  experiment  to  analyze  the 
change  in  individual  performances  with  practice.  Data  were 
collected  from  34-I-  sessions  for  two  subjects.  The  results  for 
each  session  were  characterized  by  54  dependent  variables. 
ANOVA  showed  a  main  effect  flex'  dependent  variables  but  no 
main  effect  for  sessions.  A  Tiikey  test  yielded  highly  significant 
interaction  of  sessions  x  dependent  variables  for  both  subjects, 
indicating  that  there  were  complex,  essentially  unanalysable, 
changes  in  performance  over  sessions.  On  the  other  hand,  clus¬ 
ter  analysis  discovered  a  partition  of  the  sessions  into  two  clus¬ 
ters  with  distinctively  different  performances  for  one  of  the  sub¬ 
jects.  The  nearly  chronological  conelation  between  the  clusters 
and  sessions  means  that  procrice  produced  the  change  in  stra¬ 
tegy;  strategy  being  defin^  by  the  performances  in  the  clusters. 
In  conclusion,  we  recommend  using  both  a  statistical  model 
(ANOVA)  and  an  analytical  tool  (clusta  analysis)  to  discover 
and  characterize  the  effects  of  practice  and  covert  strategics. 
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of  the  Item.  Bulletin  of  the  Psychonomic  Society,  1991, 29,  473. 


8:0(M):20  (1) 

Selective  Attention  to  an  Item  Is  Stored  as  a  Feature  of  the  Item. 
GEORGE  SPERLING.  New  York  University,  St  STEPHEN  A.  WURST. 
SUNY  at  Oswego— Subjects  must  detect  a  repetition  in  a  stream  of  30 
characters  flashed  at  10  per  second.  Items  alternate  in  either  color 
(black/white),  size,  orientation,  or  spatial  frequency.  Selectively  aoending 
t  feature  (e  g.,  black)  never  improves  detection  of  repeated  attended 
(black)  versus  urunended  (white)  items.  Many  counterintuitive  results 
are  explained  by  assuming  (1)  all  items  are  stored  in  short-term  mem¬ 
ory  (there  is  no  perceptual  filtering)  and  (2)  attention  to  an  item  is  itself 
stored  as  a  feature  of  that  item. 


Anne  Sutter,  George  Sperling  and  Charles  Chubb.  Furthur  Measurements 
of  the  Spatial  Frequency  Selectivity  of  Second-Order  Texture  Mechanisms. 
Investigative  Opthalmology  and  Visual  Science,  1991,  32,  No.  4,  ARVO 
Supplement,  1039 


FURTHER  MEASUREMENTS  OF  THE  SPATIAL  FREQUENCY 
SELECTIVITY  OF  SECOND.ORDER  TEXTURE  MECHANISMS 
Anne  Sutur,  George  Spertina.  R  Charles  Chubb 
Human  Infonnadoa  Pnccaainf  Labontoiy.  New  Yoik  Univeniiy,  NY,  NY  10003 

A  number  of  invesdgationt  of  texture  and  motion  perception  tuggeat  a 
two-iiage  pruceuing  ayiiem  conttating  of  an  initial  atage  of  aeleciive  linear 
hlteting,  followed  by  a  tecdficatioo  and  a  aecond  lUge  of  aelective  linear 
filtering.  Here  we  present  new  dau  measuring  two  propmes  of  the  second-mge 
filten;  their  contrast  modulation  aensinvity  as  a  fiinciion  of  spatial  frequency 
(MTF),  and  the  relation  of  initial  spatial  filtering  to  second-stage  selectivity.  To 
determine  the  MTF.  we  used  a  staircase  procedure  to  obtain  amplitude 
modulation  thresholds  for  the  detecdon  of  the  otientadon  of  Gabor  moduladons 
of  a  bandlimited  noise  earner.  We  used  im(»x>ved  noise  carriers  with  a  narrower 
bandwidth  than  the  stimuli  repotted  last  year.  Four  carrier  hands  were  created 
with  center  ftequencies  of  2, 4, 8,  and  16  c/deg.  The  spadal  fiequency  of  the  test 
signals  (Gabor  amplitude  moduladons)  ranged  fiom  O.S  to  8  cAieg. 

The  improvements  in  our  stimuli  produced  a  different  panero  of  results:  (1) 
The  threshold  amplitude  of  signal  m^uladon  was  lowest  for  0.5  and  1.0  cAleg. 
Above  1.0  c/deg,  threshold  increased  with  fiequency'.  (2)  There  was  a 
significant  interaction  of  carrier  fiequency  band  widi  the  modulating  fiequency, 
with  the  lowest  thresholds  occuring  for  carrier  fiequencyAnoduladon  frequency 
rados  of  about  three  to  four  octaves.  These  results  indicate  that  the  second-stage 
selecdve  filters  and  detecton  are  most  sensidve  to  frequencies  lower  than  or  equal 
to  1  c/deg.  and  that  they  are  selecdve  with  regard  to  the  spadal  fiequency  content 
of  the  carrier  noise  on  which  the  signals  are  impressed. 

'Jamw.  JiLT.  A  Kocaderink.  JJ..  (19SS).  Vir.  Xei.  25  (4)  pp.  SI  I-S2I. 
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TEXTURE-DEFINED  MOTION  IS  RULED  BY  AN  ACTIVITY  METRIC  - 
NOTBYSIMO-AIUTY 

Ptter  Werkhoven.  Charles  Chubb  and  George  Sperling. 

Human  Information  Processing  Laboratory.  Ne«  York  University 

We  examined  motion  carried  b^'  textural  properties.  The  stimuli  we 
used  consisted  of  patches  of  sinusoidal  grating  of  various  spatial 
frequencies  and  contrasts.  Phases  were  randomized  to  insure  that  motion 
mechanisms  sensitive  to  correspondences  in  stimulus  luminance  were  not 
systematically  engaged. 

We  used  an  ambiguous  apparent  motion  paradigm  h  which  a 
"heterogeneous*  motion  path  (denned  by  alternating  patches  of  a  type  A 
and  a  type  B  texture)  competes  with  a  'homogeneous*  motion  path  defined 
by  patches  of  type  A.  We  found  that  the  strength  of  these  (2nd  order) 
monon  stimuli  is  determined  by  the  covariance  of  the  activity  of  the 
textures  that  define  the  motion  paths.  The  activity  of  a  texnire  is  an 
hypothesized  property  that  is  proportional  to  the  texture’s  contrast  and  is 
found  to  be  inversely  proportional  to  its  tmadal  frequency  (within  die  range 
of  spatial  frequencies  exarraned).  Indeed,  heterogeneous  motion  between 
equu  contrast  patches  of  a  high  qiadal-frequency  texture  A  and  a  low- 
spatial  frequency  texture  B  can  easily  dominate  homogeneous  motion 
between  two  patches  of  A  because  the  activity  of  texture  B  is  higher  than 
that  of  texture  A. 

At  temporal  firequencies  higher  than  4  Hz,  we  find  that  activity 
covariance  almost  exclusively  determines  motion  strength  At  lower 
temporal  frequencies,  similanty  between  textures  becomes  a  significant 
factor  as  well. 

Stfpon.^  ^  .vFCSR  Life  Sekneet.  Vian]  Infonnition  PnccoBf  Pioinni,  Gnn;  18-0140. 
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CAN  WE  SEE  Jnd-OROER  MOTION  AND  TEXTVRE  IN  THE  PERIPHERYT 
Joshua  A.  SoUrniM  and  George  Sperlinit, 

Human  Infotmaiion  Precessinf  Laboiatofy,  New  Yorit  Univenity 

Stimuli.  Our  Ist-order  stimuli  are  moving  sine  gratings.  Our  2nd-order  stimuli 
are  patches  of  sutic  visual  noise,  whose  contrasts  arc  nwdulated  by  moving  sine 
gratings.  Neither  the  spatial  orientation  nor  die  direction  of  motion  of  these  2nd> 
order  (drift-balanced)  stimuli  can  be  detected  by  analysis  of  their  Fourier  domain 
power  spectra.  They  are  invisible  to  Reichatdt  and  motion-energy  detectors. 

Method.  For  these  dynamic  stimuli,  in  the  fovea,  and  at  12  deg  eccentricity,  we 
measured  contrast  modulation  thresholds  as  a  function  of  spatial  frequency  for 
discrimination  of  ±4S  deg  texture  slant  and  for  discrimination  of  direction  of 
motion.  Spatial  frequency  was  varied  by  changing  viewing  distance. 

Results.  For  sufficiendy  low  spatial  frequencies  and  sufliciendy  large  contrast 
modulations,  all  stimuli  are  visible  both  fovealiy  and  peripherally.  For  peripherally 
viewed  Ist-order  gratings,  the  highest  ^atid  h^uency  at  which  motion  or  texture 
discrimination  is  possible  is  about  1/4  that  at  which  the  corresponding 
discrimination  is  possible  for  fovealiy  viewed  gratings.  For  peripherally  viewed 
2nd-order  gratings,  the  highest  spatial  frequencies  at  which  motion  or  texture 
discrimination  are  possible  are  somewhat  less  than  1/4  the  frequencies  of  the 
corresponding  foveal  discrinunations.  Thus,  as  the  stimulus  moves  peripherally, 
the  visual  mechanisms  that  detect  2nd-otder  motion  and  texture  lose  sensitivity 
somewhat  faster  than  the  Ist-oider  mechanisms. 

Conclusions.  Under  certain  specific  assumptions,  our  results  suggest  the 
following  about  the  neural  detectors  involved  in  these  discriminations:  (1)  For  both 
motion  and  texture,  there  are  more  foveal  than  peripheral  detecton  at  all  ^tial 
frequencies.  (2)  There  are  more  Ist-order  than  2nd-order  detectors.  (3)  oii  the 
average,  foved  detectors  respond  to  higher  ^tial  fitequencies  than  peripheral 
detectors.  (4)  The  2nd-oider  foveal-peripheral  spatial  frequency  difference  is 
somewhat  l^er  than  the  Ist-order  difTctencc. 
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The  Lateral  Inhibition  of  Perceived  Contrast 
is  Indifferent  to  On-Center/Off-Center 
Segregation,  but  Specific  to  Orientation 

JOSHUA  A.  SOLOMON,*  GEORGE  SPERLING.t  CHARLES  CHUBBt 
Recewed  27  January  1993;  in  revised  form  21  May  1993 


When  a  central  test  patch  C,  composed  of  an  isotropic  spatial  texture,  is  surrounded  by  a  texture 
field  5,  the  perceived  contrast  of  C  depends  substantially  on  the  contrast  of  the  surround  S.  When 
C  is  surroumled  by  a  high  contrast  texture  with  a  similar  spatial  frequency  content,  it  appears  to 
have  less  contrast  than  when  it  is  surrounded  by  a  uniform  field.  Here,  we  employ  two  novel  textures: 
T*  which  is  designed  to  selectively  stimulate  only  the  on-center  system,  and  T~,  the  off-center  system. 
When  C  and  5  are  of  type  T*  and  J~,  the  reduction  of  C’s  apparent  contrast  does  not  vary  with 
the  combination  of  T*,  T~.  This  demonstrates  that  the  reduction  of  C's  apparent  contrast  is  mediated 
by  a  mechanism  whose  neural  locus  is  central  to  the  interaction  between  on-centcr  and  off-center  visual 
systems.  We  further  demonstrate  orientation  specificity  i  the  reduction  of  grating  C’s  apparent  contrast 
by  a  su.  ;aund  grating  5,  of  the  same  spatial  frequency  is  greatest  when  C  and  S  have  equal  orientatioa. 
Using  dynamically  phase-shifting  sinusoidal  gratings  of  33,  10  and  20  c/deg,  we  measured  reduction 
of  apparent  contrast  using  different  contrast-combinations  of  C  and  5.  Results:  (1)  5  gratings,  both 
parallel  and  perpendicular  to  C,  cause  a  reduction  in  C’s  apparent  contrast  relative  to  a  uniform 
surrouhd.  (2)  In  all  of  the  viewing  conditions,  the  reduction  of  apparent  contrast  induced  by  the  parallel 
surrounds  was  at  least  as  great  as  that  induced  by  the  perpendicular  surrounds.  Often  it  was  much 
greater.  (3)  Orientation  specificity  increases  with  increasing  spatial  frequency  and  with  decreasing 
stimulus  contrast 

Lateral  inhibition  Orientation  specificity  Contrast  perception  Texture  Scale  invariance 


INTRODUCTION 

Previously,  wc  demonstrated  that  the  perceived  contrast 
of  a  patch  of  isotropic,  random  visual  texture  is  dimin¬ 
ished  when  that  patch  is  embedded  in  a  surrounding 
background  of  similar  texture  (Chubb,  Sperling  & 
Solomon,  1989).  We  also  demonstrated  that,  for  brief 
flashes  of  the  center  and  surround,  this  contrast  inhi¬ 
bition  effect  is  strictly  monocular.  That  is,  when  the 
patch  and  the  surrounding  texture  are  presented  to 
different  eyes,  the  apparent  contrast  of  the  center  will  not 
be  diminished.  In  addition,  we  showed  that  this  effect 
is  spatial-frequency  specific:  when  the  spatial  frequency 
of  the  patch  differs  by  an  octave  from  the  frequency  of 
the  surround,  then  the  apparent  contrast  of  the  patch  is 
influenced  very  little  by  the  contrast  of  the  surround. 
These  results  suggest  the  existence,  at  some  level  of  visual 
processing,  of  laterally-interactive  neural  arrays  tuned 


to  local  contrast  energy  within  relatively  narrow  spatial 
frequency  bands.  Neural  arrays  of  this  type  have  also 
been  suggested  by  other  psychophysical  and  physiologi¬ 
cal  studies  (Chubb  &  Sperling,  1988,  1989;  Shapley  & 
Victor,  1978;  Enroth-Cugell  &  Jakiela,  1980;  Ohzawa. 
Sclar  &  Freeman,  1985;  Sagi  &  Hochstein,  1985;  Heeger, 
1992). 

The  present  research  describes  two  new  phenomena  of 
lateral  texture-contrast  interactions.  The  first  section 
(Expt  1)  demonstrates  that  signals  from  on-center  and 
off-center  visual  mechanisms  are  combined  prior  to 
processing  by  the  mechanism  which  mediates  the  lateral 
inhibition  of  perceived  contrast.  The  second  section 
(Expts  2-4)  demonstrates  that  the  neural  arrays  which 
compose  this  laterally  interactive  mechanism  are  tuned 
to  specific  orientations  of  spatial  texture,  and  measures 
the  orientation  specificity  as  a  function  of  the  contrasts 
of  the  center  and  surround. 
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GENERAL  METHODS 


Psychology  Department.  Rutgers  University.  Busch  Campus.  New  If*  each  experiment  two  subjects  were  run.  Each 
Brunswick.  NJ  08903,  U.S.A.  subject  was  a  trained  psychophysical  observer  (JS  and 
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CC  are  experimenters).  Each  had  normal  or  well- 
corrected  vision. 

Siimuli 

Each  stimulus  consisted  of  a  circular  patch  of  texture 
(the  center)  surrounded  by  another  circular  patch  of 
texture  (the  surround).  The  mean  luminance  of  each 
center  and  surround  was  the  same,  and  equal  to  the 
background  of  the  display.  All  displays  were  presented 
at  60  frames/sec,  and  all  stimuli  were  dynamic.  That  is. 
new  random  phases  of  the  textures  in  the  center  and 
in  the  surround  were  selected  every  j^sec.  The  images 
were  created  using  both  specially  designed  programs  and 
the  HIPS  image-processing  software  package  (Landy, 
Cohen  &  Sperling,  1984). 

Apparati 

The  displays  for  the  experiments  were  presented  on 
three  different  monochrome  graphics  monitors  using  an 
Adage  RDS  30(X)  image  display  system.  In  Expt  I, 
subject  CC  used  a  Leading  Technologies  1230V  (12  in 
diagonal)  with  a  mean  luminance  of  90cd/m^,  and 
subject  JS  used  a  US  Pixel  PX-IS  (IS  in  diagonal)  with 
a  mean  luminance  of  40  cd/m'.  In  Expts  2  and  3,  both 
subjects  used  a  Princeton  MAX- 1 3  (14  in  diagonal), 
with  a  mean  luminance  of  roughly  60  cd/m-.  In  Expt  4. 
both  subjects  used  the  US  Pixel. 

Calibration 

For  each  monitor,  luminance  linearization  was 
achieved  using  a  center/surround  display  comprised  of 
a  uniform  circular  patch  surrounded  by  an  annular 
background  containing  a  squarewave  pattern  of  spatial 
frequency  equal  to  that  of  the  sinusoidal  pattern  used 
in  Expts  2-4.  A  sheet  of  frosted  plastic  was  placed  in 
front  of  the  monitor.  At  distances  of  I  m  or  more,  this 
effectively  filtered  out  the  high  spatial  frequencies  in 
the  annular  surround,  and  both  center  and  surround 
appeared  uniform.  The  experimenter  set  the  maximum 
and  minimum  luminance  values  for  the  light  and  dark 
pixels  of  the  surround,  and  then  adjusted  the  luminance 
of  the  center  until  center  and  surround  were  no  longer 
distinguishable.  The  resulting  center  luminance  is  thus 
halfway  between  the  maximum  and  minimum  lumi¬ 
nances  of  the  display.  Systematic  iterations  of  this 
technique  yield  displays  with  precisely  calibrated  con¬ 
trasts.  In  order  to  stabilize  the  monitor’s  power  draw 
throughout  the  linearization  process,  two  separate 
center/surround  displays  were  shown  concurrently. 
When  establishing  a  relatively  high  luminance  value  on 
one  display,  the  corresponding  low  luminance  value  was 
established  on  the  other. 

Procedure 

The  subject  sat  in  a  dark  room  and  viewed  the  display 
binocularly.  The  only  source  of  illumination  was  the 
light  from  the  continuously  illuminated  display.  The  trial 
sequence  is  illustrated  in  Fig.  I.  Upon  a  key  press,  a 
stimulus  with  a  center  and  a  surround  was  presented. 
Then,  the  central  texture  was  presented  alone,  then  the 
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FIGURE  t.  Illustration  of  general  procedure.  Subject  fixates  on  a  cue 
spot.  Following  a  key  press,  eight  Trameblocks  (four  frames  at  60 
frames/sec)  appear  in  one  of  one  of  the  four  center.'surround  texture 
combinations.  They  are  followed  by  eight  frameblocks  of  just  the 
center  texture  (surround  contrast  equals  zero).  This  16  frameblock 
sequence  is  then  repeated.  Presenution  rate  is  IS  frameblocks  sec. 
Immediately  following  the  sequence  a  blank  frame  is  presented,  which 
is  terminated  by  the  subject's  response.  The  subject's  task  is  to  indicate 
whether  or  not  the  center  texture  appeared  to  have  more  contrast  in 
the  presence  of  the  surround  than  when  viewed  in  isolation. 

center/surround  pair  again  and  finally  just  the  center 
again.  Each  of  the  four  presentations  lasted  533  msec. 
The  subject's  task  was  to  make  a  forced-choice  judg¬ 
ment.  The  subject  had  to  decide  whether  the  central 
grating  had  more  contrast  in  the  presence  of  the  sur¬ 
round  or  when  it  appeared  alone.  The  subject  indicated 
his/her  choice  by  pressing  one  of  two  buttons.  There  was 
no  limit  to  the  time  within  which  the  subject  had  to  give 
his/her  answer.  In  summary:  the  center  appeared  four 
times  in  a  trial,  twice  with  the  surround  on  (“masked 
center”),  and  twice  with  the  surround  off  (“test  center”). 
The  subject’s  task  was  to  decide  whether  the  apparent 
contrast  of  the  test  center  was  greater  or  less  than  the 
apparent  contrast  of  the  masked  center. 

We  use  “c,  >  c„”  to  denote  a  response  indicating  that 
the  apparent  contrast  of  the  test  center  was  greater  than 
apparent  contrast  of  the  masked  center  and  “c,  <  c„”  to 
denote  the  response  that  the  apparent  contrast  of  the  test 
center  was  less  than  apparent  contrast  of  the  masked 
center.  Consider  the  psychometric  function  mapping  c, . 
the  actual  contrast  of  the  test  center,  to  PV'e,  >  c„"). 
We  determined  two  points  on  this  function,  the  values 
of  c,  for  which  F(“c,  >  c„”)  =  0.62  and  0.38.  This  allows 
us  to  estimate  both  the  point  of  subjective  [the  value 
of  c,  for  which  /*(“f,  >  c„”)  =  0.5]  and  the  slope  of  the 
psychometric  function,  which  is  a  measure  of  the 
intrinsic  variability  of  the  point  of  subject  of  subjective 
equality.  To  determine  these  points,  we  used  a  staircase 
procedure  in  which  the  subject’s  response  on  trial  n  is 
used  to  determine  the  contrast  of  the  test  center  on  trial 
n  -(- 1. 

For  each  stimulus,  there  were  two  interleaved  stair¬ 
cases,  designated  by  their  expected  points  of  convergence 
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on  the  psychometric  function;  a  0.62  staircase  and  a 
0.38  staircase.  In  the  0.62  staircase,  the  contrast  of  the 
test  center  was  decreased  by  one  step  size,  after  every 
"c,  >  c„”  response.  The  contrast  of  the  test  center  was 
increased  by  one  step  size,  after  two  consecutive  trials 
yielding  “c,  <  c„”  responses.  In  the  0.38  staircase,  the 
contrast  of  the  test  center  was  decreased  by  one  step 
size,  after  every  “c,  <  c„’’  response.  The  contrast  of  the 
test  center  was  increased  by  one  step  size,  after  two 
consecutive  trials  yielding  “c,  >  c„"  responses. 

Specifically,  we  measured  the  reduction  of  the  masked 
center’s  apparent  contrast  induced  by  the  presence  of 
the  surround,  as  a  percentage.  The  calculation  of  this 
is  shown  here 

percent  reduction  in  apparent  contrast 

where  c„  is  the  actual  contrast  of  the  masked  center,  and 

is  the  actual  contrast  of  the  test  center. 

For  each  viewing  condition  in  each  experiment,  sub¬ 
jects  ran  one  block  of  50  trials  (at  least  six  trials  per 
staircase)  using  a  step  size  of  value  ■^c^.  Then,  with  a 
smaller  step  size  (approximately  ^e,),  subjects  ran  as 
many  blocks  of  100  trials  as  necessary  (typically  3-5) 
until  the  variance  of  the  reversed  points  of  each  staircase, 
divided  by  the  square  root  of  the  number  of  reversals, 
was  no  greater  than  2.5%. 

EXPERIMENT  1: 

ON-CENTER/OFF-CENTER  INTERACTION 

The  fact  that  Chubb  et  al.  (1989)  observed  the 
induced  reduction  of  apparent  contrast  to  be  strictly 
monocular  in  their  conditions  suggests  it  is  a 
relatively  low  level  visual  process.  This  raises  the  possi¬ 
bility  that  the  lateral  inhibition  underlying  the  effect 
might  be  occurring  at  the  level  of  on-center  and  off- 
center  retinal  ganglion  cells  or  LGN  cells.  If  so,  it 
seems  possible  that  the  inhibition  would  selectively  occur 
between  cells  of  the  same  contrast  polarity.  In  other 
words,  perhaps  on-center  cells  selectively  inhibit  other 
on-center  cells  and  off-center  cells  selectively  inhibit 
other  off-center  cells.  Experiment  1  investigates  this 
conjecture. 

Stimuli 

The  on-center  and  off-center  visual  pathways  work 
in  tandem  to  efficiently  code  information  about  contrast 
in  the  visual  held.  Both  on-center  and  off-center  ganglion 
and  LGN  cells  maintain  a  steady  base  rate  of  firing, 
which  can  be  increased  or  decreased  by  appropriate 
stimuli.  It  seems  unlikely  that  contrast  information  from 
suprathreshold  stimuli  can  be  adequately  signaled  by 
decreases  in  the  base  firing  rates.  In  the  extreme,  no 
cell  can  distinguish  between  two  stimuli,  each  of  which 
has  sufficient  contrast  to  cause  a  complete  cessation  in 
firing.  Less  extreme  stimuli  may  slow  the  firing  rate 
down  enough  so  that  the  rate  itself  may  only  become 


discernible  to  subsequent  processing  stages  after  some 
considerable  time  (Enroth-Cugell  &  Robson.  1984). 
However,  stimuli  of  contrast  which  cause  a  decrease  in 
the  firing  rate  of  on-center  cells  should  simultaneously 
cause  an  increase  in  the  firing  rate  of  off-center  ceils,  and 
vice  versa.  Thus,  contrast  information  can  be  adequately 
coded  by  an  increase  in  the  firing  rate  of  one  of  the  two 
systems. 

Indeed,  selective,  pharmacological  blocking  of  on- 
center  cells  in  monkeys  has  been  demonstrated  (Schiller, 
Sandell  &  Maunsell,  1986)  to  severely  impair  detection 
of  bright  spots,  without  affecting  dark  spot  detection. 
This  finding  supports  the  notion  that  local  luminance 
increments  are  coded  by  the  on-center  system,  and  local 
luminance  decrements  are  coded  by  the  off-center  system. 

Two  recent  psychophysical  studies  with  human  sub¬ 
jects  supply  further  evidence  for  segregated  processing 
of  local  luminance  increments  and  decrements.  Malik 
and  Perona  (1990)  demonstrated  that  when  one  texture 
is  defined  by  patches  composed  of  light  bars  with  dark 
sidebands,  and  another  by  dark  bars  with  bright  side¬ 
bands,  a  boundary  between  the  two  textures  is  perceived 
preattentively.  Solomon  and  Sperling  (1993)  demon¬ 
strated  that  one-third  of  the  population  can  perceive  the 
motion  of  gratings  defined  by  the  same  textures  used 
in  the  current  experiment.  A  mechanism  having  a  linear 
function  of  stimulus  luminance  as  input  would  not  be 
able  to  segregate  Malik  and  Perona’s  textures  nor  extract 
motion  from  Solomon  and  Sperling’s  gratings.  Neither 
would  one  whose  input  equally  weights  local  luminance 
increments  and  decrements.  However,  performance  of 
these  tasks  can  be  modeled  by  a  mechanism  whose  input 
effectively  filters  out  either  local  luminance  increments  or 
local  luminance  decrements,  and  has  a  soft  activation 
threshold. 

Based  on  the  luminance-balanced  micro-elements  of 
Carlson,  Anderson  and  Moeller  (1980),  two  novel  tex¬ 
tures  were  designed  to  investigate  mechanisms  which 
receive  input  from  either  on-  or  off-center  neurons,  but 
not  both.  These  textures  consist  of  bright  or  dark  points 
on  gray  backgrounds.  In  theory,  bright  points  will 
selectively  increase  the  firing  rates  of  on-center  cells  in 
whose  receptive  field  centers  they  fall,  and  dark  points 
will  increase  the  firing  rates  of  off-center  cells  in  whose 
receptive  field  centers  they  fall.  These  textures  are  some¬ 
what  similar  to  the  stimuli  used  by  Zemon,  Gordon  and 
Welch  (1988),  in  an  attempt  to  differentially  stimulate 
the  on-  and  off-center  systems.  Ours  differ  from  the 
textures  used  by  Zemon  et  ai,  in  that  ours  are  designed 
so  that  the  level  of  adaptation  of  neurons  in  each 
pathway  remains  constant,  independent  of  the  polarity 
of  the  texture.  This  is  accomplished  by  ensuring  that  the 
mean  luminance  of  all  textures  remains  constant  and 
that  phase  (i.e.  the  positions  of  the  bright  and  dark 
points)  is  randomly  determined  every  -h  sec.  Unlike  the 
static  textures  used  by  Zemon  et  al.  which  were  not 
equated  for  mean  luminance,  our  textures  are  designed 
so  that  any  neuron  with  a  receptive  field  large  enough  to 
include  several  bright  or  dark  points  will  receive  the 
same  stimulation. 


2674 


JOSHUA  A.  SOLOMON  et  al. 


ON  OFF 


FIGURE  2.  Illustration  of  “ON”  and  “OFF'  textures.  A  vertical  (or  horizontal)  slice  through  each  texture  is  diagrammed. 

Mean  lurttittance  L,  is  indicated  on  the  ordinates. 

Insofar  as  the  on-center  and  off-center  ganglion  cells  on-center  cells  in  whose  receptive  held  centers  they  fall, 

can  be  modeled  as  having  center-surround  antagonism  and  the  dark  spots  will  increase  the  bring  rates  of 

and  a  (soft)  threshold  for  bring,  then  the  bright  spots  in  off-center  cells.  Various  plausible  assumptions  about  the 

our  textures  will  selectively  increase  the  bring  rates  of  responsiveness  of  on-  and  off-center  systems  make  a  high 


:r 

" 

RGURE  3.  Stimuli  for  Expt  I.  (a)  On-center  stimulating  center,  on-center  stimulating  surround  (ON/ON),  (b)  On-center 
stimulating  center.  olT-center  stimulating  surround  (ON/OFF),  (c)  Off-center  stimulating  center,  on-center  stimulating  surround 
(OFF/QN).  (d)  Off-center  simulating  center,  off-center  stimulating  surround  (OFF/OFF). 

*See  the  attached  sheet  for  better  figure. 
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degree  of  physiological  selectivity  likely.  For  example, 
if  responses  are  proportional  to  the  second  power  of 
contrast  of  near-threshold  stimuli,  then  stimulation  of 
the  on-center  system  by  bright  points  should  dwarf  any 
concomitant  stimulation  of  the  off-center  system  by 
their  background.  Nonetheless,  the  true  physiological 
selectivity  of  these  textures  remains  to  be  tested. 

The  texture  designed  to  selectively  stimulate  on-center 
cells  is  comprised  of  a  regular  grid  of  bright  pixels 
(the  pixel  at  every  third  row  and  every  third  column  is 
bright).  This  texture  is  called  an  “ON”  texture.  The 
“OFF”  texture  is  designed  to  selectively  stimulate  off- 
center  cells;  it  is  comprised  of  a  regular  grid  of  dark 
pixels  (the  pixel  at  every  third  row  and  every  third 
column  is  dark).  The  luminances  of  the  other  pixels  in 
the  textures  are  chosen  so  that  the  mean  luminance  of 
the  ON  texture  is  equal  to  the  mean  luminance  of  the 
OFF  texture  (see  Fig.  2). 

The  stimuli  used  in  this  experiment  were  composed 
of  center/surround  combinations  of  these  textures. 
The  positions  of  the  pixel  grids  in  each  center  and  each 
surround  of  each  block  of  four  frames  (57  sec)  were 
randomly  chosen  from  one  of  nine  possible  phases 
(three  horizontal  positions  times  three  vertical  pos¬ 
itions);  this  produces  a  dynamically  changing  display 
and  the  appearance  of  a  jittering  boundary  between 
center  and  surround. 

There  were  four  stimulus  combinations  corresponding 
to  two  different  types  of  surround  texture  times  two 
different  types  of  center  texture.  The  four  center/ 
surround  combinations  are  shown  in  Fig.  3.  ON  masked 
centers  (with  a  surround)  are  judged  only  relative  to  ON 
test  centers  (without  a  surround),  and  OFF  test  centers 
are  judged  only  relative  to  OFF  masked  centers. 

The  stimuli  were  viewed  from  a  distance  of  0.67  m. 
At  this  distance,  for  JS  the  surround  subtended  a  visual 
angle  of  9.3 deg,  and  the  center,  l.Sdeg.  For  CC  the 
surround  subtended  a  visual  angle  of  7.2  deg,  and  the 
center,  1.2  deg. 

Results  and  discussion 

The  results  for  both  subjects  are  plotted  in  Fig.  4. 
Two  points,  the  means  of  the  staircases  with  the  different 
convergence  points,  are  shown  for  each  center/surround 
combination.  The  lower  point  indicates  the  percent 
reduction  in  apparent  contrast,  as  determined  by  the 
0.38  staircase;  the  upper  point  indicates  the  0.62  stair¬ 
case.  Symbol  size  reflects  maximum  standard  error. 
Most  standard  errors  are  much  less  than  symbol  size. 

For  each  center/surround  combination,  both  subjects 
show  more  than  a  S0%  reduction  of  the  center's  appar¬ 
ent  contrast  induced  by  the  surround.  The  mean  percent 
reduction  (mean  of  0.62  and  0.38  staircases)  of  apparent 
contrast  does  not  vary  with  center/surround  combi¬ 
nation.  A  surround  which  is  intended  to  excite  only 
the  off-center  visual  system  causes  the  same  degree  of 
reduction  in  the  apparent  contrast  of  a  center  which  is 
intended  to  excite  only  the  on-center  visual  system,  as 
does  a  surround  which  is  intended  to  excite  only  the 
on-center  system,  and  vice  versa. 


According  to  these  results,  the  neural  mechanism  that 
mediates  the  lateral  interactions  responsible  for  this 
reduction  of  apparent  contrast  combines  information 
from  both  the  on-center  and  the  off-center  pathways. 
The  mechanism  for  the  lateral  inhibition  of  perceived 
contrast  lies  central  to  the  point  of  on-center/off-center 
integration. 

EXPERIMENTS  2-4: 

ORIENTATION  SPECIFICITY 

The  procedure  we  use  here  was  motivated  by 
the  initial  observation  that  a  surround  grating  whose 
overall  contrast  is  temporally  modulated  will  cause  an 
apparent,  opposite  phase  modulation  in  the  contrast  of 
a  temporally  constant  target  grating.  When  two  target 
gratings  are  used,  one  with  orientation  parallel  to  that  of 
the  surround  and  one  with  orientation  perpendicular  to 
that  of  the  surround,  the  contrast  of  the  parallel  target 
seems  to  modulate  more  than  the  contrast  of  the  perpen¬ 
dicular  target.  Further  observations  suggested  that  the 
disparity  between  contrast  inhibition  induced  by 
parallel  and  perpendicular  surround  gratings  was 
not  always  pronounced.  Some  stimulus  parameters 
are  better  than  others  at  eliciting  orientation  specific 
differences  in  contrast  inhibition.  The  following 


ONON  OFF/ON  ONA)FF  OFF/OFF 
Stimulus  oonfiguration 

FIGURE  4.  Results  for  subjects  JS  and  CC.  Expt  I .  For  each  stimulus 
configuration,  there  is  a  0.62  probability  that  the  apparent  reduction 
in  contrast  induced  by  the  presence  of  the  surround  is  less  than  that 
denoted  by  the  upper  point.  Likewise,  there  is  a  0.38  probability  that 
the  apparent  reduction  in  contrast  induced  by  the  presence  of  the 
surround  is  less  than  that  denoted  by  the  lower  point.  Symbol  size 
reflects  maximum  standard  enor. 
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experiments  were  designed  to  confirm  our  initial 
observations. 

Stimuli 

All  the  stimuli  were  center/surround  combinations 
of  sinewave  gratings.  For  each  sec  frameblock  of  the 
stimulus,  the  phases  of  both  the  center  and  the  surround 
gratings  were  independent  and  randomly  determined  at 
one  of  four  possible  phases.  The  sinewave  gratings  were 
presented  in  one  of  two  different  orientations:  either 
slanted  45  deg  in  one  direction  or  slanted  45  deg  in  the 
other  direction.  There  were  four  center/surround  combi¬ 
nations,  corresponding  to  the  two  different  orientations 
of  surround  grating  times  the  two  different  orientations 
of  center  grating.  The  four  stimulus  combinations  are 
illustrated  in  Fig.  5. 

Procedures 

There  were  four  independent  variables:  center/ 
surround  orientation  (parallel,  perpendicular),  spatial 
frequency  (3.3,  10, 20  c/deg),  contrast  of  the  surround  c,, 
and  contrast  of  the  masked  center  c„.  Spatial  frequency 
was  varied  by  varying  viewing  distance;  this  had  the 
virtue  of  leaving  all  the  physical  characteristics  of 
the  display  intact  and  varying  only  the  retinal  scale. 
The  dependent  variable  was  the  percent  reduction  in 
apparent  contrast  of  the  center  induced  by  the  presence 
of  the  surround,  as  defined  in  equation  (1).  Viewing 
conditions  and  results  for  Expts  2-4  are  summarized  in 
Table  1. 

The  monitor  used  to  display  the  stimuli  in  Expt  4  was 
different  from  the  one  used  to  display  the  stimuli  in 
Expts  2  and  3.  As  a  consistency  check,  both  subjects 
performed  the  3.3  c/deg,  c,  =  I.O,  c„  =  0.5  viewing  con¬ 


dition  with  the  new  monitor.  The  resulting  data  were 
indistinguishable  from  the  initial  data  gathered  in 
Expt  2. 

Only  with  the  c,  =  1.0,  c„  =  0.5  procedure  was  the 
center  grating  visible  at  4  m.  (At  this,  the  longest  viewing 
distance,  the  center  grating  had  a  spatial  frequency  of 
20.0  c/deg.)  Thus,  this  viewing  distance  was  omitted 
from  all  other  procedures.  Similarly,  with  the  c,  =  0.04,  ; 

c„  =  0.03  procedure,  the  center  grating  was  invisible 
from  2  m.  Thus,  only  the  shortest  viewing  distance  was  1 
used  in  Expt  4. 

Results  i 

There  were  no  systematic  differences  between  the  j 
responses  to  stimuli  of  reflectively  symmetrical  orien-  ( 
tations;  therefore,  these  data  have  been  pooled.  That  is.  ; 

the  data  from  trials  in  which  the  center  and  surround  ’ 

shared  the  same  orientation  have  been  pooled  (parallel  ^ 
configuration),  and  the  data  from  trials  in  which  the  ‘ 
center  and  surround  were  perpendicularly  oriented  have 
been  pooled  (perpendicular  configuration). 

The  results  for  Expts  2-4  are  plotted  in  Figs  5  and  6. 

As  in  Expt  1,  two  points  are  plotted  for  each  configur¬ 
ation  to  indicate  the  points  of  convergence  of  the  0.62 
staircases  and  the  0.38  staircases. 

Each  individual  graph  compares  the  reduction  in 
apparent  contrast  for  the  parallel  stimulus  configuration 
with  that  for  the  perpendicular  stimulus  configuration. 

Trends  in  the  data.  (1)  For  every  stimulus  configur¬ 
ation,  in  every  viewing  condition,  there  is'a  statistically 
significant  (P  <  0.005)  percent  reduction  of  the  center's 
apparent  contrast  induced  by  the  surround. 

(2)  The  difference  between  the  heights  of  the 
parallel-configuration  points  and  the  perpendicular- 


_ TABLE  I.  Viewing  conditions  and  results  for  Expts  2-4 

Contrast' reduction  (%) 

Contrast  dva  Spatial  Mean  Parallel  Perpendicular 

- frequency  luminance - 

Experiment  Subject  c,  c.  Surround  Center  (c/deg)  (cd/m*)  0.707  0.293  0.707  0.293 
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configuration  points  on  each  graph  is  a  measure  of  the 
orientation  specificity  in  that  viewing  condition.  In  all 
of  the  viewing  conditions,  the  percent  reduction  of 
apparent  contrast  induced  by  the  parallel  surrounds  is  at 
least  as  great  as  that  induced  by  the  perpendicular 
surrounds.  Often  it  is  much  greater. 

(3)  The  left  column  of  Fig.  5  represents  all  of  the  data 

from  the  trials  in  which  c,  =  1.0,  =  0.5.  Note  that  an 

increase  in  the  viewing  distance  (and  hence  an  increase 
in  the  retinal  spatial  frequency  of  the  gratings)  results  in 
greater  orientation  specificity.  This  general  trend  obtains 
for  the  other  combinations  of  c,  and  c„  (middle  column 
of  Fig.  5,  two  leftmost  columns  of  Fig.  6). 

(4)  The  first  and  fourth  rows  of  Fig.  5  represent  data 
from  trials  in  which  the  retinal  spatial  frequency  of  the 
gratings  was  3.3  c/deg,  and  the  ratio  of  was  2:1. 
Note  that  a  decrease  in  stimulus  contrast  results  in  an 
increase  in  orientation  specificity.  This  general  trend  also 
holds  for  the  10.0  c/deg  stimuli  (second  and  fifth  rows  of 
Fig.  5). 

GENERAL  DISCUSSION 

Chubb  et  al.  (1989)  conducted  similar  experiments  to 
those  reported  here.  However,  they  used  patches  of 
isotropically  filtered  visual  noise  rather  than  sinusoidal 
gratings.  Their  principal  findings  were;  (i)  for  high 
contrast  surrounds,  when  was  roughly  equal  to  half 
the  surround  contrast,  percent  reduction  of  apparent 
contrast  was  around  40%,  provided  that  center  and 
surround  were  filtered  into  the  same  frequency  band; 
(ii)  if  the  center  and  surround  were  presented  to  opposite 
eyes,  no  induction  occurred;  (iii)  if  center  and  sur¬ 
round  were  filtered  into  octave-wide  frequency  bands, 
with  center  frequencies  one  octave  apart,  the  percent 
reduction  of  apparent  contrast  dropped  down  to  15%. 
This  third  result  indicates  that  the  reduction  of  apparent 
contrast  induced  by  the  presence  of  the  surround  is 
spatial  frequency  specific. 

The  current  experiments  investigate  the  degree  to 
which  this  reduction  of  apparent  contrast  induced  by  the 
surround  is  orientation  specific. 

Channels,  tuned  filters,  neurons 

Since  the  pioneering  work  of  Campbell  and  Robson 
(1968),  it  has  been  recognized  that  the  visual  system 
filters  the  visual  signal  into  a  number  of  relatively 
narrow  spatial  frequency  bands,  which  they  termed 
channels.  Each  of  these  channels  can  be  modeled 
approximately  as  an  array  of  linear  filters  with  all  filters 
in  the  array  sharing  the  same  receptive  field  profile,  but 
centered  at  different  retinal  locations  so  as  to  cover  the 
visual  field.  Each  of  these  filters  produces  a  positive 
or  negative  output  in  response  to  any  given  stimulus. 
Apparent  contrast  is  proportional  to  the  absolute  value 
of  filter  output. 

One  way  of  understanding  the  results  of  Chubb  et  al. 
(1989)  is  to  suppose  that  the  output  values  produced  by 
the  filters  in  these  arrays  are  subject  to  lateral  inhibition 
from  other  filters  in  the  same  array.  In  particular,  the 


higher  the  absolute  value  of  the  output  of  a  filter  in  such 
an  array,  the  greater  its  inhibitory  effect  on  other  filters 
near  it  in  the  array.  Thus,  high  contrast  regions  of  a 
narrow  band  texture  produce  regions  of  high  absolute 
value  in  the  filter  array  tuned  to  that  texture;  in  turn, 
these  regions  of  high  absolute  value  output  act  laterally 
to  damp  the  magnitude  of  the  output  values  produced  by 
filters  in  nearby  regions  of  the  array,  thereby  lowering 
the  apparent  contrast  of  the  inhibited  region. 

In  the  visual  system,  filters  are  realized  by  neurons. 
We  assumed  that,  in  each  of  our  experimental  con¬ 
ditions,  the  observed  percent  reduction  of  apparent 
contrast  depends  on  the  amount  of  lateral  inhibition 
delivered  to  neurons  tuned  to  the  center  texture  by 
neurons  tuned  to  the  surround  texture.  For  any  viewing 
condition,  the  observed  reduction  of  apparent  contrast 
induced  by  a  parallel  surround  is  always  at  least  as  great 
as  that  induced  by  a  perpendicular  surround;  we  thus 
infer  that  the  neurons  tuned  to  the  parallel  surround 
deliver  at  least  as  much  inhibition  to  the  similarly  tuned 
neurons  being  stimulated  by  the  center  texture  than 
do  the  neurons  tuned  to  the  perpendicular  surround. 
That  is,  neurons  tuned  to  the  same  orientation  deliver 
more  inhibition  to  each  other  than  do  neurons  tuned  to 
different  orientations. 

Relations  to  physiology 

Physiological  studies  of  macaque  and  cat  have  yielded 
no  evidence  for  any  prccortical  orientation  specificity 
(Hubei  &  Weisel,  1977).  This  restricts  the  neural  locus 
of  the  interaction  between  texture-sensitive  neurons. 
Equally  restrictive  is  the  result  that  surround-induced 
apparent  contrast  reduction  is  a  strictly  monocular 
effect. 

When  we  first  reported  that  the  lateral  inhibition 
of  perceived  contrast  does  not  spread  interocularly 
(Chubb  et  al.,  1989),  we  used  tests  involving  only 
band-passed  isotropic  texture  to  support  our  claim.  To 
insure  that  this  result  held  true  for  high  frequency 
gratings  as  well,  we  re-ran  our  “interocular  induction" 
experiment  with  two  subjects.  In  this  procedure  the 
center  and  surround,  both  20  c/deg,  were  presented  to 
different  eyes  in  a  continuous  display.  Here,  and  in  the 
interleaved  same-eye  control  trials,  center  and  surround 
were  separated  by  a  thin  gray  annulus  to  prevent  rivalry. 
The  surround  flashed  either  on  or  off  every  500  msec. 
Subjects  adjusted  the  contrast  of  the  surround-on  center, 
until  it  appeared  equal  to  that  of  the  surround-off  center. 
As  before,  this  manipulation  was  effective  in  removing 
any  noticeable  interaction  between  the  contrast  of  the 
surround  and  the  appearance  of  the  center.  Thus,  we 
maintain  that  the  neural  locus  for  the  lateral  interaction 
between  texture-sensitive  neurons  lies  at  an  early  cortical 
or  precorticai  level  of  processing. 

Physiological  studies  of  the  functional  architecture 
of  macaque  and  cat  visual  cortex  have  revealed  that, 
outside  of  layer  IV  in  area  17,  binocularly  driven  cells 
greatly  outnumber  monocularly  driven  cells.  Thus  we 
propose  that  it  is  the  neurons  of  this  layer  which  com¬ 
bine  texture  information,  in  a  spatially  antagonistic 
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FIGURE  7.  Model  for  the  lateral  inhibition  of  perceived  contrast.  Central  unit  tuned  to  spmhc  spatial  frequency  and 
orientation.  Excitatory  component  a,  is  the  dot  product  (correlation)  of  the  stimulus  with  the  receptive  held  of  the  central  unit. 
Surrounding  units  are  tuned  to  a  variety  of  frequencies  and  orientations.  Their  outputs  are  rectihed  and  summed,  giving 
preferential  weighting  (indicated  by  the  hlters  in  the  small  boxes)  to  those  units  with  spatial  location,  frequency  and  orientation 
tuning  similar  to  that  of  the  central  unit.  The  response  r,  of  the  central  unit  is  scaled  with  respect  to  this  combination  b.  The 
doited  arrow  indicates  a  more  complex  model,  in  which  the  responses  of  surrounding  units  are  scaled  with  respect  to  the 

response  of  the  central  unit. 


way,  resulting  in  surround-induced  apparent  contrast 
reduction. 

A  model 

Simple  theory:  one-way  interactions.  Figure  7  diagrams 
proposed  interactions  between  various  texture-sensitive 
units.  We  use  the  term  units  (rather  than  neurons) 
because  neurons  transmit  only  positive  firing  rates  (posi¬ 
tive  signals);  it  requires  a  push-pull  pair  of  neurons 
(a  neural  unit)  to  transmit  both  positive  and  negative 
signals.  Also,  we  do  not  differentiate  here  between  a 
single  neuron  and  many  similar  neurons  that  may  be 
acting  in  concert. 

The  central  unit  is  tuned  to  a  specific  spatial  frequency 
and  orientation.  The  excitatory  component  a  is  the  dot 
product  (correlation)  of  its  receptive  field  with  the 
stimulus.  The  surrounding  units  are  tuned  to  a  variety  of 
spatial  frequencies  and  orientations.  Their  outputs  are 
first  rectified  (absolute  value)  and  then  added  together 
giving  preferential  weighting  (indicated  by  the  filters 
diagrammed  in  the  small  boxes)  to  those  units  with 
spatial  location,  frequency  and  orientation  tuning  simi¬ 
lar  to  that  of  the  central  unit.  The  output  response  r,  of 
the  central  unit  is  scaled  with  respect  to  the  rectified  sum 
of  surrounding  outputs  b.  We  consider  this  simple  model 
first,  and  then  a  more  complex  model  in  which  the 
interactions  are  reciprocal,  the  output  of  the  surround 
units  being  scaled  by  the  rectified  output  of  the  center 
unit. 

Since  virtually  every  viewing  condition  results  in  some 
reduction  in  apparent  contrast,  it  is  possible  to  construct 
a  model  that  attributes  a  proportion  p  of  this  effect 


to  a  balanced  mixture  of  parallel  and  perpendicularly 
oriented  units  that  have  precisely  equivalent  properties 
and  occur  in  precisely  the  same  numbers.  Consequently, 
any  orientation  specific  effect  must  be  attributed  to 
parallel  and  perpendicular  units  that  have  different 
properties  and  may  occur  in  different  numbers.  Alterna¬ 
tively,  one  could  attribute  a  proportion  q  ^p  of  the  total 
observed  reduction  in  apparent  contrast  to  a  population 
of  unoriented  receptive  fields.  For  conceptual  simplicity, 
for  the  proportion  p  of  orientation-balanced  units,  we 
do  not  discriminate  the  balanced  mixture  of  parallel 
and  perpendicular  receptive  fields  from  a  functionally 
equivalent  mixture  of  unoriented  receptive  fields. 

The  model  portrays  inhibition  as  a  divisive  (shunting) 
form  of  gain  control  (Sperling  &  Sondhi,  1968;  Sperling, 
1970),  for  which  percent  reduction  in  apparent  contrast 
is  the  natural  dependent  variable.  The  model  is  similar 
in  spirit  to  the  models  proposed  by  Sperling  (1989)  and 
Heeger  (1992).  It  differs  in  three  respects:  it  deals  in 
detail  with  the  contrast  saturation  functions  that  limit 
lateral  interactions,  it  allows  for  orientation  specific 
normalization,  and  reciprocal  inhibitory  interactions 
between  center  and  surround  are  treated  explicitly. 

To  apply  the  simple  model  to  the  current  experiments, 
we  consider  the  equilibrium  state  when  a  masked  center 
(contrast  c^)  with  its  surround  (contrast  c,)  is  equated  in 
apparent  contrast  to  the  isolated  test  center  (contrast  c,). 
Because  the  surround  inhibits  the  masked  center,  the 
match  is  represented  as 
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The  functions  gg  are  monotonically  increasing  functions 
that  represent  the  influence  of  the  surround  on  the 
center;  g»  =  g,  or  gj.  depending  on  whether  the  orien¬ 
tation  of  the  center  is  parallel  (||)  or  perpendicular  (1) 
to  the  surround.  The  values  of  the  weights 
depend  on  the  relative  orientations  9  of  the  units,  as  well 
as  their  retinal  locations  i.  Solving  equation  (2)  for  c,  and 
substituting  for  c,  in  equation  (I)  yields 

percent  reduction  in  apparent  contrast 


One  obvious  implication  of  equation  (3)  is  that  the 
percent  reduction  in  apparent  contrast  should  be 
independent  of  the  contrast  level  c„  of  the  matching 
stimulus.  This  can  be  checked  against  the  available  data: 
c,  =  1.  Cm  =  0.5  (Fig.  5)  and  c,=  l,  c„  =  0.1  (Fig.  6). 
There  is  a  tendency,  quite  large  in  some  instances 
(e.g.  subject,  JS,  3.3  c/deg)  for  a  smaller  reduction  in 
apparent  contrast  to  be  associated  with  higher  levels  of 
Cm.  The  observed  variation  of  percent  reduction  in 
apparent  contract  with  c„  requires  an  elaboration  of  the 
simple  theory. 

An  approximation  to  a  theory  of  fully  reciprocal 
interactions.  A  quite  natural  elaboration  of  the  theory 
of  equation  (2)  is  to  consider  that  not  only  does  the 
surround  inhibit  the  center  but  the  center  reciprocally 
inhibits  the  surround.  Because  of  its  smaller  size  and 
contrast,  the  center  may  exert  less  effect  on  the  sur¬ 
round  than  vice  versa.  A  first-order  approximation  to 
this  reciprocal  theory  is  simply  to  elaborate  the  term  h' 
[equation  (2)]  to  a  6  (no  prime)  that  incorporates 
reciprocal  inhibition  front  the  center: 


percent  reduction  in  apparent  contrast 


=  100 


ZH'i,g,(c.) 

l+A(Cm) 


(4) 


and  to  use  this  h  instead  of  only  its  numerator  [b'  in 
equation  (3)].  The  function  h(c,)  is  a  monotonic  increas¬ 
ing  function  that  represents  the  inhibitory  effectiveness 
of  the  center  as  a  function  of  its  output  magnitude. 

Equation  (4)  is  an  approximation  because  it  uses  only 
the  first  two  terms  of  an  infinite  series  of  indirect  effects 
in  which  the  reciprocal  feedback  of  the  center  affects  the 
surround  which  affects  the  center,  etc.  Indeed,  the  situ¬ 
ation  is  far  more  complex.  The  center  is  represented  by 
a  large  aggregate  of  diverse  neurons,  as  is  the  surround. 
Every  neuron  is  involved  with  all  of  its  neighbors  in 


reciprocal  feedback  interactions.  This  fully  interactive 
model  is  far  beyond  the  scope  of  the  present  paper,  both 
in  complexity  and  in  the  number  of  assumptions  that 
would  be  needed  to  fully  specify  the  model.  So,  we  stop 
with  the  first  two  terms.  In  this  two-term  approximation, 
the  effects  of  varying  the  contrast  of  the  center  (which 
are  represented  in  the  denominator)  are  separable 
from  the  effects  of  varying  the  contrast  of  the  surround 
(which  arc  represented  in  the  numerator).  The  function 
h  absorbs  the  effect  of  level  of  matching  contrast  c„  on 
percent  reduction  in  apparent  contrast. 

Orientation  specificity.  The  surround,  of  course,  has 
the  biggest  role  in  determining  the  percent  reduction 
of  apparent  contrast  of  the  center.  We  now  consider 
the  complex  effects  of  surround  contrast  c„  spatial 
frequency  /  of  the  center  and  surround,  and  relative 
orientation  (||,  1)  of  center  and  surround.  These  are 
mediated  by  the  functions  g,(c,)  and  gx(c,).  The  data 
allow  us  to  distinguish  between  three  complementary 
explanations  of  the  relationship  between  inhibitory  con¬ 
nections  between  pairs  of  texture-sensitive  units  with 
parallel  receptive  fields  and  pairs  of  texture-sensitive 
units  with  perpendicular  receptive  fields  (see  Fig.  8). 

(i)  Early  saturation:  g,(c,)  =  gi(f,)  and 

0  <  ^1  <  1  [Fig.  8(a)].  The  function  gi(c,)  mapping  input 
contrast  to  lateral  inhibition  for  parallel  surrounds  and 
the  function  gi(c,)  for  connections  between  units  tuned 
to  perpendicular  surrounds  are  identically  the  same,  only 
their  weights  differ.  The  functions  saturate  at  contrasts 
<  ±  1- 

(ii)  Low  efficiency  (same  intercept):  g^iCt)=gJ_{k^c^) 
and  Wx  =  w,,  0  <  *2  <  1  [Fig.  8(b)).  The  function  map-  , 
ping  contrast  to  lateral  inhibition  reaches  the  same 
maximum  level  for  connections  between  units  tuned 
to  different  orientations  as  it  does  for  connections 
between  units  tuned  to  equal  orientations,  but  it  has  a 
smaller  slope  (lower  efficiency)  for  connections  between 
units  tuned  to  different  orientations  than  it  does  for 
connections  between  units  tuned  to  equal  orientations. 

(iii)  Low  efficiency  (non-saturating)  [Fig.  8(c)].  The 
linear  functions  shown  in  Fig.  8(c)  satisfy  the  conditions 
on  g  and  w  defined  in  both  (i)  and  (ii).  That  is,  the 
function  mapping  contrast  to  lateral  inhibition  is  strictly 
increasing.  It  reaches  a  different  maximum  level  for 
connections  between  units  tuned  to  different  orientations 
than  it  does  for  connections  between  units  tuned  to  equal 
orientations,  and  it  has  a  smaller  slope  (lower  efficiency) 
for  connections  between  units  tuned  to  different  orien¬ 
tations  than  it  does  for  connections  between  units  tuned 
to  equal  orientations. 

While  each  of  these  assumptions  about  the  nature 
of  g,  and  gx  can  account  for  much  of  the  data,  none 
of  them  accounts  for  all.  We  consider  now  empirical 
criteria  which,  when  are  satisfied,  would  refute  each  of 
these  interpretations.  One  way  to  refute  early  saturation 
is  to  demonstrate  that,  at  high  levels  of  surround  con¬ 
trast  (e.g.  c,  =  1 .0)  there  is  no  indication  of  orientation 
specificity.  To  refute  low  efficiency  (same  intercept), 
it  is  sufficient  to  demonstrate  that,  at  high  levels  of 
surround  contrast  there  is  distinct  orientation  specificity. 
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Earty  Saturation  Lowar  Effibancy  (Samo  Intarcapt)  Loarar  Effidancy  (Non-Saturating) 


FIGURE  8.  Three  complementary  relationships  between  the  inhibition  delivered  by  lateral  connections  to  neurons  of  equal 
and  different  orientations;  (a)  Early  uturation.  the  function  mapping  contrast  to  lateral  inhibition  has  a  lower  intercept  (earlier 
saturation)  for  connections  between  neurons  tuned  to  different  orientations  (dashed  line)  than  for  connections  between  neurons 
tuned  to  equal  orientations  (solid  line),  (b)  Low  efficiency  (same  intercept),  the  function  mapping  contrast  to  lateral  inhibition 
reaches  the  same  maximum  level  for  connections  between  neurons  tuned  to  different  orientations  as  it  does  for  connections 
between  neurons  tuned  to  equal  orientations,  but  it  has  a  smaller  slope  Oower  efficiency)  for  connections  between  neurons  tuned 
to  different  orientations  (dashed  line)  than  it  does  for  connections  between  neurons  tuned  to  equal  orientations  (solid  line), 
(c)  Low  efficiency  (non-saturating),  the  function  mapping  contrast  to  lateral  inhibition  is  strictly  increasing,  reaches  a  different, 
maximum  level  for  connections  between  neurons  tuned  to  different  orientations  as  it  does  for  connections  between  neurons 
tuned  to  equal  orientations,  and  has  a  smaller  slope  (lower  efficiency)  for  connections  between  neurons  tuned  to  different 
orientations  (dashed  line)  than  it  does  for  connections  between  neurons  tuned  to  equal  orientations  (solid  line). 


Low  efficiency  (non-saturating),  can  be  refuted  by 
demonstrating  that,  for  a  given  c„,  an  increase  in 
surround  contrast  does  not  result  in  any  increase  in 
percent  reduction  in  apparent  contrast. 

For  20.0  c/deg  stimuli,  only  one  value  of  was  tested, 
so  we  cannot  refute  low  efficiency  (non-saturating). 
However,  we  can  refute  low  efficiency  (same  intercept) 
because,  for  both  subjects,  distinct  orientation  specificity 
is  apparent  in  the  data  (Fig.  S). 

For  10.0  c/deg  stimuli  again  we  are  able  to  refute  low 
efficiency  (same  intercept).  There  is  distinct  orientation 
specificity  when  c,  =  1.0  for  both  subjects,  especially 
when  c„  =  0.1  (Fig.  6).  For  10.0  c/deg  stimuli  we  are  alM 
able  to  refute  low  efficiency  (non-saturating).  There  is 
no  appreciable  difference  between  the  data  from  the 
c,=  1.0,  c„  =  0.1  viewing  condition  and  the  c,  =  0.2, 
Cn  =  0.1  viewing  condition,  for  either  subject. 

For  3.3  c/deg  stimuli,  however,  things  are  much  less 
clear  cut.  Both  subjects’  data  display  distinct  increases  in 
percent  reduction  in  apparent  contrast  with  an  increase 
in  c,.  This  prohibits  us  from  discrediting  low  efficiency 
(non-saturating).  For  JS,  only  with  c„  =  0.03  does  there 
appear  to  be  some  orientation  specificity,  when  surround 


TABLE  2.  Possible  explanations  of  orientation  specific  lateral 
inhibition 


Spatial  frequency  (c/deg) 

Subject 

3.3 

10.0 

20.0 

AS 

ES;  LE(SI):  LE(NS) 

ES 

ES:  LE(NS) 

JS 

ES:  LE(SI):  LE(NS) 

ES 

ES:  LE(NS) 

Explanations  not  discredited  by  the  data  are  given  in  each  celt. 

ES,  early  saturation;  LE(SI).  low  efficiency  (same  intercept):  LE(NS). 
low  efficiency  (non-saturating). 


contrast  is  maximal.  Whether  or  not  this  orientation 
specificity  is  distinct  enough  to  refute  low  efficiency 
(same  intercept)  is  a  matter  for  debate.  The  most  parsi¬ 
monious  judgment  is  to  accept  all  three  explanations 
as  possibilities.  For  AS,  only  with  Cb  =  0.I  does  there 
appear  to  be  any  significant  amount  of  orientation 
specificity,  when  c,  =  1 .0.  Here  again  the  best  policy  is 
not  to  discredit  any  of  the  three  explanations.  A  sum¬ 
mary  of  the  possible  explanations  for  each  subject’s  data, 
at  each  spatial  frequency,  is  given  in  Table  2. 

CONCLUSION 

Chubb  et  al.  (1989)  demonstrated  that  the  lateral 
inhibition  of  perceived  textural  contrast  is  mediated  by 
arrays  of  neurons  that  are  narrowly  tuned  for  spatial 
frequency.  The  results  of  these  experiments  indicate  that 
they  are  tuned  for  orientation  as  well.  This  research  also 
clearly  indicates  that  the  mechanism  responsible  for  the 
lateral  inhibition  of  perceived  textural  contrast  receives 
equal  inputs  from  both  the  on-center  and  the  off-center 
visual  pathways. 
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FIGURE  2.  Illustration  of  “ON”  and  “OFF”  textures.  A  vertical  (or  horizontal)  slice  through  each  texture  is  diagrammed. 

Mean  luminance  £,  is  indicated  on  the  ordinates. 

Insofar  as  the  on-center  and  off-center  ganglion  cells  on-center  cells  in  whose  receptive  field  centers  they  fall, 
can  be  modeled  as  having  center-surround  antagonism  and  the  dark  spots  will  increase  the  firing  rates  of 
and  a  (soft)  threshold  for  firing,  then  the  bright  spots  in  off-center  cells.  Various  plausible  assumptions  about  the 
our  textures  will  selectively  increase  the  firing  rates  of  responsiveness  of  on-  and  off-center  systems  make  a  high 


FIGURE  3.  Stimuli  for  Expt  I.  (a)  On-center  stimulating  center,  on-center  stimulating  surround  (ON/ON),  (b)  On-center 
stimulating  center,  off-center  stimulating  surround  (ON/OFF),  (c)  Off-center  stimulating  center,  on-center  stimulating  surround 
(OFF/ON),  (d)  Off-center  stimulating  center,  off-center  stimulating  surround  (OFF/OFF). 
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Information  Transfer  in  Iconic  Memory  Experiments 

Karl  R.  Gegenfiiitner  and  George  Sperling 


To  repon  letters  from  briefly  exposed  letter  arrays,  subjects  must  transfer  information  from  a 
rapidly  decaying  trace  (iconic  memory)  to  more  durable  storage.  In  a  partial-ieport  paradigm,  we 
systematically  varied  the  proportion  {P)  of  trials  with  a  long  cue  delay  relative  to  a  shwt  cue  delay. 
I^ticed  subjects  used  the  same  transfer  strategy  independent  of  P.  Data  from  a  partial-repott- 
plus-masking  experiment  were  used  to  construct  a  computational  model  that  accurately  predicted 
partial'  and  whole-report  performance  with  aitd  without  masks.  Assumptions:  Prior  to  a  cue, 
subjects  attend  primarily  to  the  middle  row  of  a  three-row  display,  resulting  in  nonselective  transfer. 
After  the  cue.  they  attend  only  to  the  cued  row.  Transfer  rate  is  the  product  of  iconic  legibility 
(which  depends  on  time  and  retinal  location)  and  attention  allocation  (which  shifts  after  a  cue). 
Cumulative  transfer  is  limited  by  the  capacity  of  durable  storage. 


When  subjects  are  asked  to  report  all  the  letters  they  can 
see  in  a  brief  flash  of  a  letter  array,  they  usually  can  report 
only  four  or  five  letters.  The  numter  of  reported  letters  is 
independent  of  the  number  of  displayed  letters  (when  more 
than  about  five  letters  are  displayed;  e.g.,  Sperling,  i960). 
One  might  therefore  infer  that  the  limit  on  the  number  of 
letters  reported  is  due  to  a  limited  memory  capacity,  tradi¬ 
tionally  called  the  “span  of  apprehension’*  (KUlpe,  1904; 
Wundt,  1899).  However,  a  partial-report  procedure  demon¬ 
strates  that  subjects  are  able  to  store  a  dozen  or  more  items 
in  a  very  short-term  memory  (Sperling,  i960). 

In  a  typical  partial-report  experiment,  a  3  x  3  letter  matrix 
is  followed  by  a  cue  (e.g.,  a  high-,  middle-,  or  low-pitched 
tone)  that  indicates  the  row  of  the  matrix  that  the  subject  has 
to  report.  Figure  1  shows  .“esults  from  a  partial-report  ex¬ 
periment  When  the  cue  occurs  at  the  same  time  as  the  letters 
or  shortly  afterwards,  the  subject  can  report  all  the  letters  in 
the  cued  row.  Because  the  subject  does  not  know  in  advance 
which  row  will  be  cued,  perfect  performance  implies  that  all 
the  items  are  stored  and  still  available  at  the  time  of  the  cue. 
When  the  cue  is  delayed,  partial-report  performance  de¬ 
creases,  until  it  finally  reaches  the  level  of  whole  report  at 
cue  delays  of  about  500-800  ms. 

The  d^y  of  partial-report  accuracy  with  cue  delay  has 
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been  taken  as  evidence  for  a  second  kind  of  memory,  which 
Neisser  (1967)  called  “iconic  memory.”  Neisser  assumed 
that  initially  all  items  are  held  in  iconic  memory  and,  at  the 
time  of  the  cue,  the  cued  letters  are  transferred  into  a  longer 
lasting  storage. 

Durable  Storage 

The  partial-report  experiment  itself  does  not  prove  the  ex¬ 
istence  of  two  different  memories.  The  cue  delay  effect  might 
be  caused  by  one  type  of  memory,  which  decays  to  four  or 
five  letters,  and  the  subject  has  control  over  which  letters 
survive.  However,  experiments  with  a  poststimulus  mask 
(Averbach  &  Speriing,  1960;  Sperling,  1960,  1963)  show 
that  there  is  more  than  one  memory.  When  a  poststimulus 
mask  comes  soon  after  stimulus  offset,  there  is  a  marked 
decrease  in  performance  relative  to  the  no-mask  control  con¬ 
dition.  Therefore  the  storage  that  is  probed  in  the  partial- 
report  experiment  is  destroyed  by  the  mask.  When  a  post- 
stimulus  mask  comes  later,  say,  1  s  after  the  stimulus,  it  does 
not  influence  partial  reports  at  all.  The  interaction  of  mask 
and  cue  delay  implies  that  there  are  at  least  two  types  of 
memory.  One,  iconic  storage,  has  a  large  capacity,  decays 
rapidly,  and  is  destroyed  by  a  mask  following  the  stimulus. 
The  other  storage  can  hold  only  a  limited  number  of  items 
but  is  not  affected  by  masking  and  seems  to  have  a  long 
lifetime.  Following  Coltheart  (1980),  we  call  the  second  type/ 
of  memory  “durable  storage.” 

There  have  been  several  attempts  to  discrirmnate  among 
types  of  durable  storage  in  partial-report  and  similar  types  of 
experiments  (Adelson  &  Jonides,  1980;  DiLolIo,  1984;  Dun¬ 
can,  1983;  Irwin  &  Yeomans,  1986;  Kaufman,  1978;  Loftus, 
Johnson,  &  Shimamura,  1985;  Mewhort,  Campbell,  Mar- 
chetti,  &  Campbell,  1981;  Scarborough,  1972;  Sperling. 
1967;  Townsend,  1973).  Similarly,  there  have  been  attempts 
to  discriminate  between  iconic  memory  as  revealed  by 
partial-report  experiments,  which  use  iiiformational  mea¬ 
sures,  and  other  kinds  of  visual  sensory  memoty  that  might 
be  revealed  by  direct  sensory  judgments,  which  use  syn¬ 
chrony  judgments  of  visual  traces  with  auditory  clicks 
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Cue  Delay  in  Milliseconds 


Figure  1.  Method  and  results  of  a  typical  partial-report  experi¬ 
ment.  (The  abscissa  is  the  time,  relative  to  the  onset  of  the  letter 
array,  of  a  tonal  cue  to  report  a  given  row.  The  ordinate  is  the  mean 
proportion  of  correctly  reported  letters  in  the  cued  rows.  Open 
circles  represent  the  performance  of  Subject  BL.  The  filled  bar  on 
the  right  indicates  performance  in  a  whole-repon  experiment.  The 
shaded  area  between  the  dashed  line  and  the  dots  represents  par¬ 
tial-report  superiority  [relative  to  whole  report].  The  rightmost  data 
point  indicates  performance  measured  at  a  cue  delay  of  1 ,000  ms, 
which  is  comparable  to  the  delay  involved  in  the  whole-report 
procedure.) 

(Efron,  1970;  Sperling,  1967),  the  integration  time  between 
successive  stroboscopic  flashes  (Eriksen  &  Collins,  1967; 
Hogden  A  OiLollo,  1974),  and  many  other  procedures  (see 
reviews  by  Coltheart,  1980;  Long,  1980).  Because  we  are 
concerned  only  with  partial-report  experiments  here,  we 
can  bypass  these  issues  and  de^  only  with  iconic  memory 
and  durable  storage  without  any  filler  specifications  or 
subdivisions. 


Selective  and  Nonselective  Transfer 

In  a  typical  iconic  memory  experiment,  at  intermediate  cue 
delays,  the  quality  of  the  iconic  image  has  deteriorated  so  that 
few  if  any  additional  items  can  be  transferred  into  durable 
storage.  If  the  cue  were  to  be  further  delayed  until  the  iconic 
image  had  decayed  completely,  no  transfer  from  the  cued  row 
would  be  possible,  and  performance  would  decrease  to  0.  As 
shown  in  Figure  1.  this  is  not  the  case.  Performance  at  long 
cue  delays  reaches  asymptote  at  whole-report  performance, 
not  at  zero. 

The  failure  of  partial-report  accuracy  to  decay  to  zero  as 
a  function  of  cue  delay  is  almost  certainly  due  to  what  Aver¬ 
bach  and  Coriell  (1961)  called  “nonselective  readout”  In  the 
time  between  stimulus  and  cue,  subjects  start  to  transfer 
items  from  iconic  memory  to  durable  storage.  This  transfer 
is  nonselective  with  respect  to  the  cue.  It  saves  subjects  from 
performing  badly  at  long  cue  delays.  On  the  other  hand,  the 
letters  transferred  nonselecdvely  use  up  some  of  the  limited 
capacity  of  durable  storage.  By  the  time  the  cue  appears, 
durable  storage  might  already  be  filled  with  items  from  the 
noncued  rows.  For  example,  about  four  items  can  be  trans¬ 


ferred  in  the  first  100  ms  after  stimulus  display  (Sperling. 
1963,  1967).  This  is  approximately  the  capacity  of  durable 
storage.  Thus  the  strategy  of  nonselective  readout  at  short  cue 
delays  might  have  the  disadvantage  of  overcrowding  durable 
storage,  thereby  slowing  down  the  subsequent  transfer  from 
the  cued  row. 

As  several  authors  (Coltheart,  1980;  Dick,  1969;  Hall. 
1974;  Merikle,  Lowe,  A  Coltheart.  1971;  Mewhort,  Johns, 
A  Coble,  1991;  Mewhort,  Merikle,  A  Bryden,  1969;  Sakitt, 
1976;  Sperling,  1960;  Sperling  A  Dosher,  1986)  have  sug¬ 
gested,  it  is  possible  that  subjects  deliberately  use  different 
strategies  at  different  cue  delays.  On  trials  with  long  cue 
delays,  subjects  use  nonselective  readout,  to  avoid  the 
disaster  of  having  iconic  memory  decay  to  illegibility  before 
any  items  have  been  transferred.  At  short  cue  delays,  subjects 
pay  equal  attention  to  all  rows  and  do  not  transfer  items 
nonselectively,  to  avoid  filling  durable  storage.  Of  course, 
selective  strategies  would  be  possible  only  when  partial- 
report  experiments  are  run  in  a  blocked  design,  as  they  typ¬ 
ically  have  been  (e.g.,  Irwin  A  Yeomans.  1986;  Mewhort  et 
al.,  1981;  Sakitt,  1976;  Sperling,  1960).  In  each  block  of 
trials,  only  one  cue  delay  was  used;  thereby,  subjects  can  use 
the  strategy  that  is  most  advantageous  for  the  particular  cue 
delay. 


Overall  Plan 

In  our  fust  experiment,  we  attempted  to  discriminate  be¬ 
tween  short-cue-delay  and  long-cue-delay  coding  strategies 
that  subjects  might  use  in  iconic  memory  experiments  and 
to  determine  the  costs  and  benefits  of  each  strategy.  It  was 
essential  to  resolve  the  possibility  that  subjects  tailor  their 
strategy  to  the  particular  cue  delay  before  we  proceeded 
with  &periment  2,  a  parametric  investigation  of  partial- 
report  accuracy  in  2S  combinations  of  Cue  Delay  x  Mask 
Delay.  The  data  of  Experiment  2  enable  us  to  define  a 
model  that  mathematically  describes  the  time  courses  of 
iconic  decay  and  the  twin  processes  of  selective  and  non¬ 
selective  retrieval.  The  model,  which  aggregates  all  the 
rows  of  a  three-row  stimulus,  is  very  successful  computa¬ 
tionally,  but  it  contains  two  enigmas.  These  enigmas  are 
resolved  by  noting  that  characters  from  the  middle  row  are 
transferred  to  durable  storage  much  more  rapidly  than 
characters  from  the  other  rows  and  embodying  this  fact  in 
an  elaborated  model  that  treats  each  row  separately.  We 
consider  how  our  model  differs  from  prior  computational 
approaches  to  iconic  transfer.  Finally,  we  show  that  our 
formulation  of  the  role  of  attention  in  selective  transfer  is 
consistent  with  many  other  attentional  phenomena. 


Experiment  1:  Coding  Strategies 

Suppose  that  subjects  in  an  iconic  memory  experiment  use 
a  strategy  of  selective  transfer  at  short  cue  delays  and  a  strat¬ 
egy  of  nonselective  transfer  at  long  cue  delays.  We  wished 
to  estimate  the  cost  of  each  strategy  when  it  was  used  in¬ 
appropriately  and  the  benefit  of  each  strategy  when  it  was 
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used  appropriately  in  terms  of  the  cost-benefit  analysis  of 
Posner,  Nissen,  and  Ogden  (1978).  The  problem  in  estimat¬ 
ing  the  cost  of  nonselective  transfer  at  short  cue  delays  was 
to  get  the  subject  to  use  nonselective  transfer  (an  inappro¬ 
priate  strategy)  at  short  cue  delays.  Here  is  the  trick.  In  a 
blocked  situation  with  only  long  cue  delays,  we  assumed 
subjects  would  certainly  use  nonselective  transfer  (the  ap¬ 
propriate  long-delay  strategy).  When  we  included  very  few 
uials  with  a  short  cue  delay  in  such  a  block,  subjects  were 
still  better  off  from  a  decision-theoretic  viewpoint  using  non¬ 
selective  transfer  throughout  the  whole  session.  Assuming 
that  subjects  would  try  to  maximize  their  performance,  we 
predicted  they  would  use  the  long-delay  suategy  in  blocks  of 
predominantly  long  delays  and  the  short-delay  strategy  in 
blocks  of  predominantly  short  delays.  The  difference  in  per¬ 
formance  between  (a)  the  few  short-delay  trials  embedded  in 
a  block  of  predominantly  long-delay  trials  and  (b)  a  pure 
block  of  short-delay  trials  provided  an  estimate  of  the  cost 
of  using  nonselective  transfer  (the  inappropriate,  long-delay 
strategy)  at  short  cue  delays. 

A  similar  argument  was  posited  for  long  cue  delays.  We 
predicted  that  when  a  short  cue  delay  was  presented  93%  of 
the  time,  subjects  would  use  selective  transfer  (the  appro¬ 
priate  strategy).  On  the  occasional  long  cue  delays,  we  ex¬ 
pected  their  performance  to  be  very  poor.  Figure  2  illustrates 
a  predicted  outcome  of  this  sort  of  experiment.  Performance 
at  long  cue  delays  was  expected  to  decrease  markedly  with 
a  reduction  in  the  probability  of  the  occurrence  of  a  long  cue 
delay.  Performance  for  both  types  of  cue  delay  was  expected 
to  be  highest  in  the  blocked  design  condition. 


Figure  2.  Strategy  analysis:  The  expected  outcomes  of  the  cost 
analysis  experiment  in  which  either  a  long  or  a  short  cue  delay  can 
occur  on  a  trial.  (The  abscissa  is  the  probability,  within  a  block  of 
trials,  of  the  long  cue  delay.  The  ordinate  is  the  mean  proportion  of 
correct  reports,  conditioned  on  the  type  of  cue  [long  vs.  short) 
delay.  The  upper  curve  is  the  expected  performance  with  shon  cue 
delays;  the  lower  curve  represents  long  cue  delays.  Performance  at 
long  cue  delays  is  expected  to  be  poor  when  long  cue  delays  occur 
rarely  [bottom  left]  and  to  increase  as  they  become  predominant. 
Performance  at  short  cue  delays  is  symmetrically  opposite.  The 
solid  bars  through  the  data  points  indicate  standard  errors.  Their 
differing  length  indicates  that  in  the  low-probability  conditions, 
fewer  trials  will  be  available.) 


Method 

Subjects.  Two  graduate  and  two  undergraduate  students  at  New 
York  University  panicipated  in  the  experiment  for  pay.  All  subjects 
had  normal  or  corrected-to-normal  vision.  Each  subject  had  a  min¬ 
imum  of  five  practice  sessions  of  200  trials  each;  for  some  subjects, 
practice  continued  longer  until  their  performance  in  a  regular 
partial-report  experiment  with  equally  likely  cue  delays  reached  a 
steady  state.  Subjects  BL  and  FC  were  presented  3x3  arrays. 
Performance  for  subjects  RS  and  BF  was  better,  so  they  were  shown 
3X4  arrays. 

Stimuli.  All  experiments  were  controlled  by  a  Digital  Equip¬ 
ment  Corporation  POP- 1 1/23  computer.  The  leners  were  presented 
on  a  Hewlett  Packard  1310A  cathode  ray  tube  (CRT)  with  a  fast 
white  P4  phosphor.  The  CRT  was  driven  by  a  specially  designed 
display  interface  (Kropfl,  1975)  and  software  for  real-time  vision 
experiments  (Melchner  &  Sperling,  1 980).  Tones  were  presented  on 
Sennheiser  HD414  headphones.  A  Wavetek  Model  159  waveform 
generator  was  used  to  generate  the  tones,  which  were  set  to  a  com¬ 
fortable  listening  level.  The  timing  of  the  actual  stimulus  sequences 
was  verified  by  independent  oscilloscopic  measurements  and  was 
accurate  to  within  1  ms. 

The  stimuli  consisted  ofa3X3or3x4  array  of  letters.  Figure 
3a  shows  a  photograph  of  a  typical  stimulus.  The  whole  display 
extended  3.1°  of  4.5°  of  visual  angle,  respectively,  at  a  viewing 
distance  of  128  cm.  Each  letter  was  1.2  cm  high  arid  1.0  cm  wide, 
with  a  distance  between  letters  of  2.0  cm  horizontally  and  1.8  cm 
vertically.  Viewing  was  binocular. 

The  luminance  of  the  letters  was  determined  by  measuring  the 
luminance  of  a  uniform  rectangle  with  a  United  Detector  Technol¬ 
ogies  photometer,  which  had  been  calibrated  against  a  standard  light 
source.  The  rectangle  had  the  same  pixel  intensity  as  the  letters,  the 
same  pixel  spacing,  and  the  same  number  of  dots  as  the  letter  bit¬ 
maps.  The  measured  luminance  was  34  cd/m^.  The  letters  were 
displayed  on  a  dark  background  of  approximately  0.05  cd/m^.  The 
room  was  dimly  illuminated,  and  the  wall  behind  the  monitor  had 
a  luminance  of  approximately  1 .2  cd/m^.  The  individual  letters  were 
raiKlomly  chosen  without  replacement  from  the  set  of  20  conso- 
S  nants,  excluding  Y. 

^  Procedure.  Each  partial-report  session  consisted  of  200  trials. 
^  Figure  4a  shows  a  flow  diagram  for  one  trial.  The  subject  initiated 
M  the  trial  by  pressing  a  button.  After  a  random  interval  of  1 .0-1 .5  s, 
a  the  stimuli  were  displayed  for  50  ms  (five  repeated  frames  at  10  ms 
^  per  frame).  At  the  time  specified  by  the  cue  delay,  a  tone  was 
sounded  on  the  headphone  for  1(X)  ms.  The  ftequencies  of  the  cue 
tones  were  225, 6(X),  and  975  Hz  for  the  bottom,  middle,  and  top 
row,  respectively.  The  time  for  the  cue  delay  was  measured  from  the 
onset  of  the  stimulus.  Typically,  cue  delays  in  partial-report  exper¬ 
iments  have  been  specified  in  terms  of  the  time  from  stimulus  ter¬ 
mination  (e.g.,  Sperling,  1960,  and  many  others).  Our  reason  for 
specifying  a  cue  delay  relative  to  stimulus  onset  was  that  the  delay 
then  corresponded  to  the  time  for  which  sdmulus  information  was 
available  before  a  cue  appeared. 

Cue  delay  could  be  varied  independently  of  the  other  stimulus 
parameters,  and  the  cue  could  occur  before  stimulus  onset,  during 
the  stimulus,  or  after  stimulus  termination.  After  the  stimulus  se¬ 
quence,  the  subject  was  prompted  on  the  screen  for  a  typed  re¬ 
sponse.  After  the  subject  responded,  the  correct  letters  were  shown 
on  the  screen,  together  with  the  subject's  response.  Then  the  next 
trial  started.  A  response  letter  was  scored  as  correct  only  when  it  was 
reported  in  the  correct  serial  position. 

In  this  experimental  design,  it  is  inevitable  that  the  low- 
probability  condition  for  one  cue  delay  coincides  with  a  high  prob¬ 
ability  for  the  other  cue  delay.  Therefore  the  number  of  observations 
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Figure  3.  Panel  a  shows  a  typical  stimulus  display.  (In  both  Experiments  I  and  2.  all  letters  were 
white  on  a  black  background:  their  contrast  is  reversed  here  for  tetter  reproducibility.  Subjects  BF 
and  RS  used  a  3  x  4  matrix  of  letters:  Subjects  PC  and  BL  used  a  3  x  3  matrix.)  Panel  b  shows 
a  mask.  (Like  the  stimulus,  the  mask  is  shown  in  reversed  contrast.  Each  component  masking  pattern 
consists  of  five  different  letters  displayed  in  extremely  rapid  succession.) 


for  the  low-probability  conditions  is  smaller.  Performance  was  eval¬ 
uated  for  cue  delay  probabilities  of  0. 1 , 0.5, 0.9,  a.nd  1 .0.  The  short 
cue  delay  used  was  always  0  ms.  The  duration  of  the  long  cue  delay 
was  chosen  separately  for  each  subject  so  as  to  achieve  a  perfor¬ 
mance  level  that  would  still  be  tetter  than  whole  report.  The  delay 
values  were  400,  800,  and  1,000  ms. 

Results 

The  data  were  analyzed  separately  for  each  subject  Figure 
S  illustrates  the  results  for  subjects  PC  and  BF.  Table  1  sum¬ 
marizes  the  data  for  all  4  subjects.  If  there  is  no  effect  of 
probability  of  occurrence,  then  performance  should  not  vary 
and  all  data  points  for  a  fixed  cue  delay  should  fall  on  a 
straight  horizontal  line.  We  therefore  estimated  slope  and 
intercept  of  the  best  fitting  lines  (in  the  least  squares  sense) 
through  the  data. 

Table  1  shows  that  the  slopes  of  least-squares-estimated 
lines  through  the  data  points  were  all  negligibly  sniall.  Seven 
of  eight  slopes  were  negative,  and  none  of  them  was  sig¬ 
nificantly  different  from  0  (according  to  t  tests  at  the  .05 
significance  level). 

Discussion 

Three  unambiguous  aspects  of  the  data  lead  to  three  sig¬ 
nificant  conclusions. 

Performance  in  response  to  a  low-probability  long-delay 
cue  did  not  approach  zero  but  reached  asymptote  at  a  level 
typical  for  whole  report  This  means  that  subjects  always 
used  nonselective  transfer. 

Performance  in  response  to  a  low-probability  short-delay 
cue  was  not  impaired  compared  with  that  in  response  to  a 
high-probability  short-delay  cue.  We  infer  that  nonselective 
transfer  did  not  involve  any  additional  cost  for  the  subject 
even  on  trials  in  which  selective  transfer  was  also  used. 


The  finding  that  performance  was  better  for  short-  than 
long-delay  cues  indicates  that  subjects  indeed  used  selective 
transfer  for  short-delay  cues. 

Finally,  with  respect  to  experimental  procedures,  if  sub¬ 
jects  have  more  than  one  good  strategy  available,  the  par¬ 
ticular  mixture  of  strategies  that  they  use  would  depends  on 
the  particular  mixture  of  cue  delays  they  confront.  The 
finding  that  subjects  used  the  same  strategy  for  short-  and 
at  long-delay  cues  greatly  simplified  the  design  of  Experi¬ 
ment  2.  The  mixture  of  cue  delays  could  be  optimized  for 
obtaining  the  desired  data,  unconstrained  by  (non)efTects 
on  subjects’  strategies. 

The  finding  that  a  single  transfer  strategy  was  used  at  all 
cue  delays  is  in  striking  contrast  to  previous  observations 
suggesting  at  least  two  strategies.  Sperling  (1960,  Figure  5) 
showed  a-  subject  whose  short-delay-cue  strategy  failed 
against  long-delay  cues  and  whose  long-delay-cue  strategy 
failed  to  take  advantage  of  short-delay  cues.  Another  (fa¬ 
mous)  subject  (Sperling,  1990,  Figure  6)  retained  his  short- 
delay-cue  strategy  for  too  long  a  cue  delay,  thereby  produc¬ 
ing  a  nonmonotonic  iconic  decay  function.  The  simplest 
explanation  for  the  discrepancy  between  the  present  data  and 
Spring’s  data  is  that  the  earlier  data  were  obtained  in  the 
first  few  hundred  trials  with  naive  subjects  whose  perfor¬ 
mance  was  clearly  nonoptimal.  The  present  data  show  that, 
after  practice,  subjects  acquire  a  single  strategy  that  is  ef¬ 
fective  for  both  long  and  short  cue  delays. 


Experiment  2:  Time  Course  of  Iconic  Memory 

The  results  of  Experiment  1  left  us  with  two  open  ques¬ 
tions:  How  do  subjects  avoid  overfilling  durable  storage 
when  selective  transfer  follows  nonselective  transfer?  More 
generally,  how  are  nonselective  transfer  and  selective  trans¬ 
fer  combined?  To  address  these  questions,  we  introduced  a 


repeat  20  tlapf 


Figure  4.  Panel  a  is  a  flow  chart  for  a  trial.  (The  three  parallel  streams  for  letters,  cue,  and  mask 
indicate  that  the  onset  times  for  these  couM  be  varied  independently  to  produce  any  arbitrary 
ordering.  The  mask  was  used  in  Experiment  2  only.)  Panel  b  is  a  flow  chart  for  the  production  ot 
a  mask.  (A  sequence  of  Sve  different  frames  is  painted  with  6-ins  interframe  intervals;  the  sequence 
of  five  is  repeated  20  times.) 


variably  delayed  poststimulus  mask  into  the  paitial-repott 
procedure. 

An  appropriately  chosen  visual  postexposure  masking 
stimulus  should  have  two  properties:  It  should  destroy  the 
contents  of  iconic  memory  but  leave  durable  storage  unim¬ 
paired.  For  the  destruction  of  iconic  memory,  a  mask  is  con¬ 
structed  in  such  a  way  that  when  it  and  the  test  stimulus  are 
exposed  simultaneously,  the  test  stimulus  is  masked  to  the 
point  of  unintelligibility  (Kahneman,  1%8;  Sperling,  1963). 
The  ability  to  mask  the  test  stimulus  completely  when  it  is 
strongest  (i.e.,  when  it  is  physically  present)  implies  that  the 
postexposure  masking  stimulus  will  even  more  effectively 
mask  the  test  after  it  has  been  weakened  by  decay.  The  ability 
to  leave  durable  storage  unimpaired  is  demonstrated  by 
showing  that  long  mask  delays  yield  equivalent  performance 
to  no-mask  control  conditions.  Such  a  pdststimulus  mask 
serves  to  limit  the  time  for  which  information  from  iconic 
memory  is  available  for  transfer  to  durable  storage.  By  vary¬ 
ing  cue  delay  and  mask  delay  independently  in  a  cross^ 


design,  we  obtained  estimates  for  the  amount  of  transfer  to 
durable  storage  in  each  interval. 

Figure  6  illustrates  the  logic  of  the  masking  paradigm  in 
three  kinds  of  conditions.  In  the  first  condition  (IHgure  6a), 
the  cue  occ^  after  stimulus  onset  and  before  mask  onset. 
During  the  interval  between  stimulus  onset  and  the  cue,  the 
subject  does  not  know  which  row  will  be  cued.  Therefore, 
all  transfer  is  nonselective  with  respect  to  the  cue.  After 
the  cue  has  occurred,  the  subject  switches  attention  to  the 
cued  row  and  transfers  letters  selectively  flrom  that  row. 
We  call  these  two  kinds  of  infornution  transfer  from 
iconic  memory  to  durable  storage  nonselective  and  selec¬ 
tive  transfer,  respectively. 

Two  special  cases  lead  to  pure  selective  and  pure  nonse¬ 
lective  transfer.  When  the  mask  comes  before  or  at  the  same 
time  as  the  cue,  only  nonselective  transfer  occurs  (Figure  6b). 
When  the  cue  comes  at  or  befwe  stimulus  onset,  subjects  use 
selective  transfer  throughout  (Figure  6c).  In  all  other  cases 
(Figure  6a),  there  is  a  mixture  of  selective  and  nonselective 
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Figure  5.  The  absence  of  strategy  effects.  (Results  of  Experiment  1  are  shown  for  Subjects  PC  and 
BE  The  ordinate  is  the  proportion  of  correct  reports;  the  abscissa  is  the  probability  of  the  long  cue 
delay  in  a  block  of  200  trials.  The  two  lines  in  each  panel  are  the  best  fitting  horizontal  lines  to  the 
data  for  each  cue  delay.  The  vertical  bars  represent  the  standard  error  (±1  a].) 


transfer.  Because  the  cue  is  irrelevant  to  nonseiective  trans¬ 
fer,  the  pure  nonseiective  conditions  should  yield  the  same 
results  as  a  whole-report  experiment  with  similarly  delayed 
poststimulus  masks.  This  whole-report  experiment  was  car¬ 
ried  out  as  a  control  condition. 


Method 

The  general  experimental  methods  and  subjects  were  the  same  as 
in  Experiment  I  exc^  for  the  following  changes. 

Subjects.  IVvo  subjects,  BL  and  BF,  who  had  served  in  the 
Experiment  1,  served  again  in  Experiment  2.  It  should  be  remarked 
that  BF  was  able  to  report  one  or  two  items  more  than  average  from 
brief  visual  exposures.  This  would  place  him  in  the  upper  10-20% 
of  subjects  in  our  experience.  He  was  persuaded  to  serve  in  this 
tedious  experiment  in  our  hope  of  discovering  some  other  unusual 
ability.  However,  except  for  a  slightly  higher  level  of  performance, 
his  d^  were  typical  in  all  respects.  In  addition,  the  two  other  sub¬ 
jects  from  Experiment  1  served  for  about  half  as  many  trials  as  BL 
and  BF.  Their  dau  did  not  differ  in  any  important  ways  from  those 
of  Subjects  BL  and  BF  and  are  not  presented  here. 


Masking  stimulus.  In  Experiment  2.  a  masking  pattern  (the 
mask)  was  displayed  at  a  specified  mask  delay,  which  could  be 
shorter  or  longer  than  the  cue  delay.  A  mask  consisted  of  five  dif¬ 
ferent  letters  displayed  in  extremely  rapid  succession  at  each  spa¬ 
tial  location,  so  that  the  letters  were  summed  by  the  visual  system 
and  could  not  be  recognized  individually  (Budiansky  8l  Sperling. 
1969).  All  the  letters  comprising  one  frame  of  a  mask  were 
painted  within  6  ms,  a  new  frame  was  presented  every  6  ms,  and 
the  sequence  of  five  different  frames  was  repeated  20  times  for  a 
total  mask  duration  of  600  ms.  The  flow  diagram  in  Figure  4  il¬ 
lustrates  this  process.  The  intensity  of  masks,  measured  in  the 
same  way  as  the  intensity  of  the  letters  in  Experiment  1,  was  47 
cd/m^  Figure  3b  illustrates  a  typical  masking  pattern.  In  a  brief 
control  experiment,  it  was  verified  that  recognition  of  a  stimulus 
letter  was  at  chance  when  it  was  presented  at  the  same  time  as  a 
mask. 

Procedure.  Mask  delays  of  lOO,  200,  3(X>,  400,  and  SOO  ms 
were  used  in  the  experiment  The  cue  delays  chosen  were  0.  100, 
200,  300,  and  400  ms.  On  each  trial,  a  cue  delay  and  mask  delay 
were  chosen  randomly  in  a  mixed-list  design.  Each  subject  was 
tested  on  approximately  5,000  trials  in  4S-min  long  sessions  of  200 
trials  each. 


Table  1 

The  Proportion  of  Correctly  Reported  Letters  as  a  Function  of  the  Probability  of  Cue  Delays  in  Experiment  1 


Probability  of  cue  delay 


Subject/cue 
delay  (ms) 

0.1 

No.  of 
observations 

OJ 

No.  of 
observations 

0.9 

No.  of 
observations 

1.0 

No.  of 
observations 

Slope 

BF 

0 

0.897 

92 

0.903 

283 

0.864 

736 

0.904 

200 

-0.015 

1,000 

0.719 

64 

0.648 

317 

0.63 

708 

0.398 

200 

-0.012 

PC 

0 

0.9 

40 

0.887 

140 

0.879 

360 

0.863 

200 

-0.026 

400 

0.737 

40 

0.743 

140 

0.727 

360 

0.663 

200 

-0.082 

BL 

0 

0.884 

23 

0.916 

103 

0.923 

181 

0.912 

80 

0.034 

800 

RS 

0.667 

19 

0.S81 

97 

0.605 

177 

0.648 

53 

-0.026 

0 

0.983 

35 

— 

— 

0.956 

244 

0.992 

32 

-0.068 

1,000 

0.717 

23 

— 

— 

0.714 

365 

0.631 

80 

-0.06 

Note.  Slope  data  indicate  the  slope  of  a  least  squares  fitted  line  through  the  data  points. 
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Figure  6.  The  logic  behind  the  partial-report-plus-irtask  experi¬ 
ment.  (Panel  a  shows  nonselective  and  selective  transfer.  The  cue 
occurs  before  the  mask,  nonselective  transfer  occurs  before  the 
cue,  and  selective  transfer  occurs  during  the  interval  cue  to  the 
mask.  Panel  b  shows  pure  nonselective  transfer.  The  cue  occurs  at 
or  after  the  onset  of  the  mask.  Nonselective  transfer  ceases  after 
onset  of  the  mask;  there  is  no  resumption  of  transfer  after  the  cue 
occurs.  Panel  c  shows  pure  selective  transfer  The  cue  comes  at  or 
before  the  onset  of  the  stimulus.) 

Whole  report.  In  the  whole-report  condition,  the  subject  was 
asked  to  report  all  the  letters  in  the  display.  The  same  mask  used 
in  the  parti^-report  condition  was  used.  The  whole-report  practice 
and  test  conditions  were  run  in  separate  sessions  after  the  subjects 
were  already  practiced  in  the  partial-repoit  task.  Data  were  collected 
only  after  performances  had  reached  asymptote. 

Results 

As  in  Experiment  1,  the  data  were  analyzed  separately  for 
each  subject.  Figure  7  shows  the  effect  of  cue  delay  with 


mask  delay  as  a  parameter.  As  in  other  partial-report  exper¬ 
iments,  performance  dropped  as  cue  delay  increased,  con¬ 
firming  that  subjects  made  efficient  use  of  the  cue.  A  strictly 
monotonic  decrease  in  proportion  correct  as  a  function  of  cue 
delay  means  that  using  selective  transfer  in  any  time  interval 
yielded  more  correctly  reported  letters  than  did  nonselective 
transfer. 

Figure  8  replots  the  data  of  Figure  7  with  mask  delay  as 
the  abscissa  and  cue  delay  as  a  parameter.  The  effect  on 
performance  of  mask  delay  was  also  monotonic;  the  number 
of  tfansferred  letters  increased  rapidly  with  increasing  mask 
delay.  A  monotonic  increase  in  proportion  correct  with  in¬ 
creasing  mask  delay  means  that  additional  available  time  for 
processing  the  stimulus  was  always  useful. 

Pure  nonselective  transfer.  Figure  9  shows  data  for  pure 
nonselective  transfer — all  the  trials  on  which  the  cue  oc¬ 
curred  simultaneously  with  or  after  mask  onset.  Performance 
increased  very  quickly  in  the  first  100  ms  and  then  reached 
asymptote  at  around  four  or  five  letters.  This  indicates  that 
these  subjects  were  able  to  read  about  four  letters  in  less  than 
100  ms,  which  is  at  the  same  level  that  other  investigators 
have  found  (e.g.,  Sperling,  1963).  Figure  9  also  shows  the 
data  from  the  whole-repon  procedure.  Whole-report  accu¬ 
racy  is  slightly  lower  than  partial-report  accuracy.  We  assume 
that  this  slight  whole-report  deficit  was  due  to  the  larger 
number  of  letters  that  needed  to  be  reported.  Subjects  might 
have  occasionally  forgotten  a  letter  while  reporting  the  ear¬ 
lier  ones.  Therefore  the  partial-report-plus-masking  proce¬ 
dure  seems  to  be  a  slightly  better  indicator  of  nonselective 
transfer  than  whole  report. 

The  extreme  right  of  Figure  9  shows  that  whole  reports 
with  a  SOO-ms  mask  yielded  equivalent  performance  to  that 
in  the  no-mask  control  condition.  This  result  indicates  that 
the  masking  stimulus  satisfied  the  second  condition  stated  for 
a  successful  mask:  It  did  not  interfere  with  the  contents  of 
durable  storage. 

Pure  selective  tranrfer.  The  subset  of  conditions  with  a 
cue  delay  of  0,  which  indicate  pure  selective  transfer,  yielded 
data  that  are  superficially  similar  to  nonselective-transfer 
data  when  graphed  in  terms  of  the  actual  number  of  letters 
reported,  as  shown  in  Figure  10.  Accuracy  increased  mono- 
tonically  with  mask  delay.  However,  selective  transfer  took 
longer  than  nonselective  transfer  to  approach  its  asymptotic 
level  (approximately  400  ms  vs.  200  ms).  The  asymptotic 
accuracy  level  of  selective  transfer  was  much  higher  than  that 
of  nonselective  transfer  (90%  vs.  30%),  indicating  a  panial- 
report  advantage.  As  in  the  case  of  nonselective  transfer, 
when  a  mask  was  delayed  300  ms,  there  was  only  a  negligible 
difference  between  mask  and  no-mask  conditions. 

An  Aggregate-Row  Model  of  Iconic  Memory 

Experiment  2  characterized  purely  selective  and  purely 
nonselective  transfer.  In  an  attempt  to  explain  how  they  both 
combine  in  the  overall  transfer  to  durable  storage,  we  de¬ 
veloped  a  model  that  aggregates  performance  over  rows. 
Subsequently  we  found  that  although  the  model  gave  ex¬ 
cellent  predictions  of  the  present  data,  it  left  some  serious 
residual  problems.  To  resolve  these,  we  developed  a  more 
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Figurt  7.  Accuracy  of  partial  reports  as  a  function  of  cue  delay,  with  mask  delay  as  the  parameter: 
Experiment  2.  (The  ordinate  indicates  the  proportion  of  correctly  repotted  letters.  The  right  ordinate 
indicates  the  corresponding  number  of  letters  transferred  to  durable  storage.  Each  data  point 
represents  lSO-230  trials.  Panels  indicate  data  for  Subjects  BL  and  BF;  BL  viewed  3x3  displays, 
and  BF  viewed  3X4  displays  [Rows  X  Columns].  The  curves  drawn  through  the  data  points  are 
the  best  fiu.ng  predictions  of  the  two-process  aggregate-row  model  described  in  the  text.) 


complicated  model  in  which  each  row  is  considered  sepa¬ 
rately.  The  formulation  of  the  aggregate-row  model  is  pre- 
:  -  .:d  in  this  section. 

Basic  Assumptions:  Additivity  of  Nonselective  and 
Selective  Transfer 

Both  selective  and  nonselective  transfer  contribute  to  the 
overall  performance.  In  Experiment  2,  only  the  contribution 
made  by  nonselective  transfer  was  directly  observable.  The 
contribution  of  selective  transfer  could  be  observed  only  in 
the  absence  of  nonselective  transfer,  that  is,  when  selective 
transfer  started  immediately  at  stimulus  onset  with  a  cue 
delay  of  0.  We  now  estimate  selective  transfer  at  nonzero  cue 
delays.  We  proceed  by  making  an  assumption  about  the  com¬ 
bination  rule  for  selective  and  nonselective  transfer.  This  as¬ 


sumption  allows  us  to  subtract  nonselective  transfer  from 
overall  performance  to  derive  selective  transfer  at  various 
cue  delays. 

The  simplest  combination  rule  is  additivity  of  the  two 
transfer  processes.  (Averbach  &  Coriell,  1961,  made  a  dif¬ 
ferent  assumption,  which  is  considered  in  the  Discussion.)  To 
implement  activity  of  transfer  processes,  we  make  the  fol¬ 
lowing  assumptions,  (a)  Letters  are  transferred  nonselec- 
dvely  from  stimulus  onset  on  until  the  cue  comes,  (b)  Se¬ 
lective  transfer  begins  at  onset  of  the  cue  and  ends  at  onset 
of  the  mask,  when  all  further  information  transfer  out  of 
iconic  memory  stops,  (c)  The  total  number  of  letters  trans¬ 
ferred  is  the  sum  of  both  transfer  processes.  Specifically, 
given  a  cue  at  time  c  and  a  mask  at  time  m,  the  total  number 
of  letters,  Lc,  transferred  from  the  cued  row  is  the  sum  of 
the  number  of  nons^lecdvely  transferred  letters  from  the 
cued  row,  (VS)^c  and  the  number  of  selectively  transferred 
letters  from  the  cued  row,  5c.  „■ 
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Figure  8.  Results  of  Experiment  2  teplotted  to  show  the  accuracy  of  partial  reporu  as  a  function 
of  mask  delay,  with  cue  delay  as  the  parameter.  (The  vertical  bars  through  the  data  points  indicate 
the  standard  error  of  the  proportions.  The  data  points  for  each  cue  delay  are  connected  by  dotted 
lines.  See  Figure  7  for  d^ls.) 
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Figure  9.  Pure  nonselective  transfer  as  a  function  of  mask  delay.  (The  open  symbols  are  the 
proportion  of  correct  partial  reports  on  trials  in  which  the  cue  occurred  after  mask  onset  [Figure  6b]; 
the  filled  symbols  are  the  proportion  of  correct  reports  in  whole-report-plus-mask  trials.  The 
horizontal  bar  at  the  right  border  indicates  performance  on  whole-report  trials  without  masks.  The 
predictions  of  the  aggregate-row  model  of  iconic  memory  for  partial  and  whole  reports  are  indicated 
by  solid  and  dotted  lines,  respectively.  The  dashed  line  shows  the  partial-report  predictions  of  the 
model  described  in  Equation  13,  averaged  over  the  three  rows.) 


(1) 

The  factor  'A  express  the  fact  that  because  there  ate  three 
equally  likely  cues,  only  one-third  of  the  nonselectively 
transferred  letters  are  expected  to  be  in  the  cued  row.  To  apply 
Equation  1  to  our  data,  we  note  that  we  already  know  two 
of  its  three  components.  If  we  assume,  for  the  moment,  that 
all  letters  that  are  transfened  from  iconic  memory  to  durable 
storage  are  reported,  then  the  partial-report  data  directly  yield 
the  total  reported  letters,  Partial  reports  made  when  the 
cue  occurs  simultaneously  with  or  after  the  mask  give  the 
pure  nonselective  component,  'ANc_  „ic^  m),  the  analysis 
illustrated  in  Figure  9.  The  difference  between  Le,„  and 
'ANc,  m  is  the  selective  transfer.  Sc, 

Figure  1 1  shows  the  values  of  selective  transfer  derived 
from  our  data.  Note  that  the  only  difference  between  Figures 


8  and  1 1  is  that  the  nonselective  transfer  component  has  been 
subtracted  from  overall  performance.  All  the  curves  for  se¬ 
lective  transfer  appear  to  be  parallel,  shifted  vertically.  This 
implies  that  only  one  factor  determines  selective  transfer — 
time  elapsed  since  stimulus  onset.  Selective  transfer  that  be¬ 
gins,  for  example,  200  ms  after  stimulus  onset  will  transfer 
just  as  many  items  in  the  time  period  from  200  to  300  ms  as 
selective  transfer  that  began  at  0  ms.  Because  the  rate  of 
selective  uansfer  depends  only  on  the  elapsed  time  since 
stimulus  onset,  it  directly  reflects  the  quality  of  the  stimulus 
information. 

To  test  the  assumption  of  additivity,  we  fit  the  best  set  of 
perfectly  parallel  curves  to  our  data.  We  do  not  make  any 
assumptions  about  the  form  of  the  selective  or  nonselective 
transfer  curves.  The  solid  line  segments  in  Figure  1 1  all  de¬ 
rive  from  a  single  curve  that  has  been  translated  up  or  down. 


Figure  10.  Pure  selective  transfer  as  a  hinction  of  mask  delay.  (The  filled  symbols  show  the 
proportion  of  correct  partial-reports  on  trials  on  which  the  cue  occurred  at  stimulus  onset  [Figure  6c]. 
The  solid  line  shows  the  prediction  of  the  aggregate-row  model  of  iconic  memory.  The  horizontal 
bar  on  the  right  border  shows  observed  performance  in  a  partial-report  experiment  without  masks 
and  a  cue  delay  of  0.  The  solid  line  shows  the  predictions  of  the  model  described  in  Equation  14, 
averaged  over  the  three  rows.) 


854 


KARL  R.  GEGENFURTNER  AND  GEORGE  SPERUNG 


Figure  It.  Test  of  the  additivity  assumption  in  the  aggregate-row  model.  (The  curves  are  based  on 
Figure  8 — accuracy  of  partial  reports  as  a  function  of  mask  delay,  with  cue  delay  as  the  parameter. 
Cue  delays  are  in  milliseconds:  Filled  circle  -  0.  open  circle  =  l(X),  filled  square  •  200,  open 
square  =  300,  triangle  >=  400.  The  assumption  of  algebraic  additivity  of  transfer  [Equation  2] 
permits  the  subtraction  of  the  estimated  nonselective  component  of  transfer  (Figure  9]  from  each 
curve  of  Figure  8  to  yield  the  residual  selective  transfer.  The  symbols  show  observed  values  of 
residual  selective  transfer  after  various  cue  delays.  The  solid  curves  show  the  predictions  that  are 
based  on  vertical  translations  of  a  single  generic  selective-transfer  curve  [e.g.,  delay  0].  The  form 
of  the  generic  selective-transfer  curve  was  estimated  from  the  data.) 


The  assumption  of  additivity  holds  well  for  our  data.  The 
root-mean-square  error  is  0.016  for  subject  BL  and  0.023  for 
subject  BF. 

Some  Parametric  Assumptions 

The  pure  information  transfer  functions  for  the  nonseiec- 
tive  transfer  process  in  Figure  9  and  the  selective  transfer  in 
Figure  10  can  both  be  approximated  by  simple  exponential 
growth  functions  of  the  form 

y(r)  =  C[l -exp(-r/T)].  (2) 

where  C  is  the  asymptotic  level  of  performance,  and  t  is  the 
time  constant  of  the  growth  process  at  an  /(t)  of  63%  of  C. 

We  denote  nonselective  transfer  as  Ng^mU),  and  selective 
transfer  as  Sc,„(t).  The  indices  remind  us  that  transfer  may 
depend  not  only  on  L  but  also  on  the  jpebific  values  of  the 
cue  delay,  c,  and  the  mask  delay,  m.  For  pure  nonselective 
transfer,  the  cue  comes  after  die  mask,  arid  we  obtain 

=  Q [1  -  exp( - cam,  (3) 

where  Cn  is  the  capacity  of  durable  storage  and  is  die  time 
constant  for  nonselective  transfer.  Figure  9  shows  Equation 
3  with  the  constants  chosen  to  optimize  die  Bt  to  our  data. 
The  deviations  of  data  from  theory  are  very  small. 

Purely  selective  transfer  occurs  when  the  cue  conies  at  (or 
before)  the  stimulus  onset  Similarly  to  Equation  3,  we  obtain 

So.-W  =  Cs[l-exp(-t/T,)].  (4) 

In  Equation  4,  Cs  is  the  maximum  number  of  letters  the 
subject  can  transfer  from  one  line.  In  general,  Cs  will  be  very 
close  to  the  number  of  letters  in  one  line.  However,  Cs  has 
to  be  estimated  because  subjects  are  not  perfect  and  they 
occasionally  miss  a  letter  even  in  the  easiest  conditions.  The 
time  constant  for  selective  transfer  is  7$. 


Figure  10  shows  the  best  fit  of  Equation  4  to  our  data  for 
selective  transfer.  Again,  deviations  between  the  theory  and 
the  data  are  small.  From  Figures  9  and  10,  we  see  thk  the 
growth  rates  of  nonselective  and  selective  information  trans¬ 
fer  curves  are  quite  different,  reflecting  their  different  time 
constants,  and  7$. 

Selective  transfer  for  cue  and  mask  delays  with  c  £  m 
occurs  only  during  the  interval  from  c  to  m; 

Sc..W  =  Si,.«W-5o.,(r).  (5) 

Of  course,  Sg.miO  is  0  whenever  c  £  m.  The  total  number 
of  letters  available  for  report  in  the  cued  row  as  a  function 
of  time  is  given  by  a  generalization  of  Equation  1: 

=  Sg..{t)  +  (6) 

The  total  number  of  letters  available  for  whole  reports  is 
simply 

A  final  complication  is  diat  die  time  a  subject  needs  in 
order  to  interpret  the  cue  may  be  greater  than  zero.  To  admit 
this  possibility,  a  parameter  7q,  the  cue  interpretation  time,  is 
included  in  the  model  as  an  offset  parameter,  substituting  c 
-F  7q  for  c  in  Equation  6. 

Figure  1 2a  summarizes  the  descriptive  model.  TWo  cu¬ 
mulative  functions,  Ne,„{t)  and  Sc,m(r),  describe  informa¬ 
tion  transfer  from  iconic  memory  to  dui^le  storage.  Before 
occurrence  of  a  cue,  transfer  is  governed  by  Ng^  „(r);  after  the 
cue,  by  Sc.  m(0-  It  is  useful  to  think  of  the  cue  as  a  switch  that 
toggles  between  the  two  transfer  rates. 

Predictions  for  a  partial-report-plus-mask  experiment  are 
represented  in  Figure  1 2b.  The  number  of  letters  available  for 
report  follows  the  trajectory  to  Cu  until  the  occurrence  of  a 
cue.  It  then  follows  the  trajectory  described  by  Sr.  miO-  After 
the  onset  of  the  poststimulus  mask,  the  predicted  trajectory 
is  flat. 

Figure  12b  shows  that  die  cue  is  predicted  to  help  most 
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Figure  12.  Panel  a  is  a  block  diagram  of  the  two-process  aggre¬ 
gate-row  model  of  iconic  transfer.  (The  first  box  indicates  iconic 
memory  with  a  capacity  Cs  decaying  with  time  constant  Ts  after  a 
brief  stimulus  presentation.  The  partial-report  cue.  after  delay  r^. 
causes  a  shift  ^m  an  initial  nonselective  transfer  [Tn]  to  selective 
transfer  [ts]  into  durable  storage.  All  transferred  items  ate  added  in 
durable  storage;  its  apparent  capacity  Ck  varies  slightly  depending 
on  whether  it  is  determined  from  partial  or  from  whole  reports  [see 
Figure  9].)  Panel  b  illustrates  the  computation  in  the  two-process 
model.  (Dotted  curves  show  selective  and  nonselective  transfer. 
Before  the  cue,  transfer  is  nonselective  and  proceeds  at  rate  Cn/t^ 
to  asymptote  Cs-  After  the  cue,  transfer  is  selective  at  rate  Cj/ts  to 
asymptote  Cs.  The  arrows  indicate  that  in  effect,  the  generic  se¬ 
lective  transfer  curve  is  joined  to  the  generic  nonselective  curve  at 
the  moment  in  time  that  the  cue  takes  effect.) 


97%  of  the  variance  in  the  data  for  Subjects  BL  and  BF, 
respectively.  The  root-mean-square  errors  are  0.089  and 
0.095,  respectively. 

The  same  subjects  served  in  an  earlier  partial-report  ex¬ 
periment,  similar  to  Experiment  1  but  without  poststimulus 
masks.  These  earlier  data  can  be  fit  by  the  model  derived 
from  Experiment  2  without  estimating  any  new  parameters. 
Figure  13  shows  the  panial-report-without-masking  data  and 
the  model  predictions.  The  parameter  values  were  estimated 
from  the  experiment  using  masks.  The  fit  obtained  this  way 
does  not  deviate  significandy  from  the  data.  The  dashed  lines 
in  Figure  13  show  the  contributions  of  nonselective  transfer. 
The  model  indicates  that,  after  stimulus  termination,  selec¬ 
tive  transfer  decreases  much  faster  than  one  might  expect 
from  the  relatively  slow  decay  in  partial-report  superiority. 

Table  2  summarizes  the  parameter  estimates  for  Subjects 
BF  and  BL.  Two  sets  of  estimates  are  shown.  One  set.  already 
described,  was  derived  from  the  subsets  of  the  data  that  pro¬ 
vided  the  pure  nonselective  transfer  and  the  pure  selective 
transfer  analyses  illustrated  in  Figures  9  and  1().  A  second  set 
of  parameters  was  estimated  from  the  complete  data  of  the 
paitial-report-plus-masking  experiments.  The  comparison  of 
these  estimates  is  an  indicator  of  the  overall  consistency  of 
the  model,  which  is  quite  good. 

For  each  subject,  the  nonselective  capacity  parameters, 
Cns,  are  very  similar  in  the  three  relevant  data  sets:  the  full 
paitial-repoit-plus-masking  data  set,  the  cue-after-noise  sub¬ 
set,  and  the  whole-report  data  set.  The  capacities  are  five 
letters  for  BL  and  seven  letters  (well  above  normal)  for  BF. 

The  nonselective  capacity  estimate  Cs  is  effectively  equal 
to  3,  the  number  of  letters  in  one  row  for  BL.  It  is  about 
5-10%  less  than  4  for  BF,  who  was  shown  four-letter  rows. 

The  time  constants  for  selective  and  nonselective  transfer, 
Ts  and  Tn,  are  quite  different  from  each  other.  Selective  trans¬ 
fer  continues  to  rise  steadily  until  after  200  ms.  whereas 
nonselective  transfer  asymptotes  quickly  after  100  ms.  Both 
subjects  have  similar  time  constants,  although  their  capac¬ 
ities  differ. 

For  both  subjects,  the  time  Tq  necessary  to  interpret  the  cue 
is  estimated  to  be  0  or  slightly  negative. 

The  speed  of  the  transfer  processes  is  determined  by  dif¬ 
ferentiating  Equation  2.  This  results  in 

/'(r)-C/Texp(-r/T).  (7) 

For  r  =  0,  Equation  7  reduces  to 

/'(O)  =  C/T.  (8) 


Enigmas 


when  given  within  100  ms  of  the  stimulus  onset.  In  the  first 
100  ms,  the  cumulative  transfers  and  differ 

only  slightly.  After  100  ms,  Nc,  „(r)  reaches  its  asymptote, 
whereas  Sc,  m(t)  continues  for  at  least  another  300  ms. 

Parameter  Estimations  and  Fits  to  the  Data 

The  curves  in  Figure  7  show  the  fit  of  the  complete  model 
to  the  data  of  both  subjects.  The  model  accounts  for  96%  and 


Computation  of  the  initial  transfer  rates  immediately  at 
stimulus  onset,  S'(0)  and  A^'(O),  shows  that  nonselective 
transfer  has  a  much  higher  rate.  Seventy  and  45  letters/s  are 
transferred  nonselectively  for  subjects  BF  and  BL,  respec¬ 
tively,  and  only  27  and  15  are  transferred  selectively.  Note 
that  the  nonselective  transfer  rates  are  based  on  the  total 
number  of  letters  transferred  into  durable  storage,  not  merely 
on  the  letters  in  the  cued  row.  In  the  aggregate  model,  it  is 
not  obvious  why  the  actual  speed  of  nonselective  and  se- 
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Figure  13.  Data  from  a  partial-report  experiment  without  masks.  (The  open  symbols  show  the 
proportion  correct  for  various  cue  delays.  The  solid  line  shows  the  predictions  of  the  two-|Hocess 
aggregate-row  model.  The  dashed  curve  indicates  the  estimated  component  of  perfoniumce  resulting 
from  nonselective  transfer.  The  distance  from  the  dashed  line  to  the  solid  line  [partial-repott 
accuracy]  represents  the  estimated  contribution  of  the  selective  iraitsfer  process.) 


lective  transfer  should  appear  to  differ  by  so  much;  this  issue 
is  addressed  in  the  row-by-row  model,  presented  later. 

It  also  is  surprising  that  the  estimate  time  the  subject 
needs  to  interpret  the  cue  Tq  is  essentially  zero.  Experiments 

Table  2 

Best  Fitting  Parameter  Values  for  the  Aggregate-Row 
Model  (Partial  Report  With  Masking)  arul  for  Data 
Subsets  That  Weld  Estimates  of  Pure  Selective  and 
Pure  Nonselective  Transfer  in  Experiment  2 


Experinrent 

Cs 

Cn 

Ts 

Tn 

ims  error 

Selective 

3.58 

Subject  BF 
—  130.0  — 

0.073 

Nonselective 

— 

7.08 

— 

99.9 

-16.62 

0.021 

Combination 

— 

— 

— 

— 

— 

0.023 

Overall 

3.82 

7.35 

191.1 

114.31 

-12.5 

0.095 

Partial  report 

— 

— 

— 

— 

— 

0.049 

Whole  report 

— 

6.3 

— 

— 

— 

— 

Selective 

Subject  BL 
—  158.9  — 

8.47 

0.120 

Nonselective 

— 

4.75 

104.79 

0.29 

0.032 

Combination 

— 

0.016 

Overall 

2.98 

5.0 

197.5 

110.4 

-13.6 

0.089 

Partial  report 

— 

— 

— 

— 

— 

0.016 

Whole  report 

— 

5.0 

— 

— 

— 

— 

Note.  Cs  and  Cn  represent  attentional  capacities,  respectively,  of 
selective  and  nonselective  transfer,  with  units  in  letters;  r,  and 
represent  time  constants  of  selective  and  nonselective  transfer, 
with  units  in  milliseconds:  tins  is  root  mean  square.  Selective  and 
nonselective  experiments  the  parameters  were  estimated  from  sub¬ 
sets  of  the  data  that  did  not  require  using  the  additive  combinaticm 
rule.  In  the  combination  experiment,  the  combination  rule  that 
estimated  only  additivity,  not  any  of  the  parameters,  was  listed.  In 
the  overall  experiment,  the  complete  model  for  Experiment  2  was 
tested.  In  the  partial-report  procedure,  the  partial-repoit-plus-mask 
parameters  were  used  to  p^ct  the  data  horn  an  earlier  partial- 
report-without-mask  experiment  The  whole-report  experiment  en¬ 
tailed  simftiy  observation  of  subjects'  performance.  No  parameten 
were  estimated.  A  dash  indicates  that  a  parameter  could  not  be 
estimated  for  a  particular  condition. 


by  Reeves  and  Spelling  (1986)  using  visual  cues  showed  that 
a  spatial  shift  of  visual  attention  took  300-400  ms.  Sperling 
and  Weichselgaitner  (in  press)  used  a  click  in  a  gtVno-go 
attention  shift  experiment  that  required  only  turning  on  at- 
tentitm,  not  actually  shifting  it  in  space.  They  found  a  modal 
switching  time  of  about  100  ms.  Our  tonal  cues,  which  re¬ 
quired  a  three-choice  reaction  and  a  spatial  shift  of  attention, 
would  certainly  be  expected  to  have  a  much  longer  attention 
shift  latency,  lliese  enigmas  suggest  the  need  for  more  com¬ 
plex  analysis,  which  we  provide  in  the  next  section  by  an¬ 
alyzing  tlte  dau  separately  for  each  row. 


Position  Effects 

Partial-Report  Accuracy  by  Row,  Cue  Delay, 
and  Mask  Onset  Time 

The  probability  of  correct  partial  reports  as  a  function  of 
mask  delay  with  cue  delay  as  a  parameter  is  displayed  in 
.Figure  14.  Each  panel  shows  data  for  a  different  row  of  the 
display.  Partial  reports  of  the  middle  row  differ  from  reports 
of  the  top  and  bottom  rows,  and  we  consider  the  middle  row 
first  Almost  always,  subjects  report  the  middle  row  perfect¬ 
ly.  Even  in  the  bar^t  conditions  (sh<Ht  mask  delay  long 
cue  delay)  subjects  report  80%  of  the  middle-row  letters  cor¬ 
rectly.  &cept  for  the  earliest  mask  at  100  ms,  all  the  other 
middle-row  curves  appear  equal  at  a  performance  level  of 
about  95%  correct  Tbm  is  no  apparent  iconic  decay  for  the 
middle  row.  Obviously,  the  transfer  to  durable  storage  of 
letters  from  the  middle  row  is  nonselective,  and  in  this  the 
middle  row  differs  from  the  other  rows. 

For  the  top  and  bottom  rows,  performance  decreases  from 
near  perfect  in  easy  conditions  to  near  chance  in  the  hardest 
conditions.  Because  nonselective  transfer  determines  the  as¬ 
ymptotic  performance  at  long  cue  delays,  the  data  for  the  top 
and  bottom  rows  indicate  there  is  much  less  nonselective 
transfer  (and  correspondingly  more  selective  transfer)  from 
these  rows  than  from  the  middle  row. 
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Figure  14.  Accuracy  of  partial  repons  as  a  function  of  mask  delay,  with  cue  delay  as  a  parame¬ 
ter,  shown  separately  for  each  of  the  three  stimulus  rows.  (Cue  delays  are  in  milliseconds;  Filled 
circle  =  0,  open  circle  =  1(X),  filled  square  =  200,  open  square  =  300,  triangle  =  4(X).  Top,  Mid¬ 
dle,  and  Bottom  denote  the  stimulus  rows.  BL  and  BF  denote  the  subjects.  Each  data  point  repre¬ 
sents  the  proponion  correct  in  S0-1(X)  trials.  Note  the  large  and  highly  significant  performance 
differences  between  the  rows.  The  curves  are  predictions  of  the  nine-parameter  attentional  model 
[Equations  12-14],  with  parameters  given  in  Table  3.) 


Selective  and  Nonselective  Transfer  by  Row 


To  estimate  the  amount  of  nonselective  transfer,  we  con¬ 
sider  the  subset  of  data  with  cue  onset  at  or  after  mask  onset 
(as  in  Figure  9).  Figure  1 5  shows  nonselective  transfer  for  the 
three  rows.  For  both  subjects,  the  middle  row  rises  to  a  high 
asymptotic  level  within  the  first  1(X)  ms.  The  other  rows  rise 
slowly  and  reach  generally  lower  asymptotic  levels. 

Nonselective  transfer:  Strategy  mixture  versus  pure  strat¬ 
egy.  To  account  for  the  subjects'  good  performance  on 
the  middle  row  in  the  nonselective  transfer  data,  we  con¬ 
trast  two  possibilities:  a  trial-to-trial  variation  of  transfer 
strategy  (which,  over  a  series  of  trials,  most  often  favors 
the  middle  row)  and  a  consistent  strategy  that  favors  the 
middle  row  on  every  trial.  Suppose  that,  prior  to  the  stim¬ 
ulus  exposure  on  each  trial,  subjects  preselected  a  particu¬ 
lar  row  to  transfer  nonselectively  immediately  following 
the  exposure.  Suppose  that  from  trial  to  trial,  they  switched 
their  preferred  rows,  but  on  the  average,  they  most  often 
chose  the  middle  row.  In  this  strategy,  we  would  expect  to 
find  some  trials  for  the  top  and  bottom  row  on  which  the 


subjects’  performances  were  perfect  or  nearly  so.  This  is 
not  the-case,  however.  Of  the  trials  on  which  the  cue  indi¬ 
cated  a  report  of  the  top  or  bottom  row,  fewer  than  2%  of 
the  reports  had  all  letters  correct  (compared  with  70%  for 
the  middle  row).  From  this,  we  infer  that  subjects  did  not 
switch  between  rows  and  that  the  consistent  preference  of 
the  middle  row  in  nonselective  transfer  is  responsible  for 
its  higher  nonselective  transfer  rate.  Indeed,  most  of  the 
letters  repotted  in  the  nonselective  conditions  come  from 
the  middle  row. 

On  the  other  hand,  in  the  conditions  that  favor  selective 
transfer,  the  proportions  with  which  the  different  rows  are 
sampled  are  nearly  the  same.  This  explains  why  the  aggregate 
model  yielded  faster  rates  of  letters  actually  entering  durable 
storage  for  nonselective  than  for  selective  transfer:  Nonse¬ 
lective  transfer  sampled  mostly  the  fast  middle  row,  whereas 
selective  transfer  provided  an  almost  equal  mixture  of  all 
three  rows. 

Selective  transfer.  Figure  16  shows  pure  selective  trans¬ 
fer  estimated  (as  in  Figure  10)  from  trials  on  which 
the  cue  occurred  simultaneously  with  the  onset  of  the 
test  fiash.  For  the  middle  row,  selective  transfer  yields  the 
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Figure  15.  Pure  nonselective  transfer  as  a  function  of  mask  delay  for  each  of  three  stimulus  rows 
and  2  subjects.  (The  data  points  are  the  proportion  of  correct  partial  reports  on  trials  in  which  the 
cue  occurred  at  or  after  mask  onset  (Figure  6b].  Circles  indicate  the  top  row,  triangles  indicate  the 
middle  row,  and  squares  indicate  the  bottom  row.  The  solid  curves  are  predictions  of  the  nine- 
parameter  attentionid  model  [Equation  12],  with  parameters  given  in  Table  3.) 


same  apparent  growth  curve  as  nonselective  transfer.  Note 
thaL  for  both  subjects,  selective  transfer  for  the  top  and  bot¬ 
tom  row  reaches  almost  perfect  performance  at  the  longest 
mask  delays.  This  indicates  that  the  information  in  the  stim¬ 
ulus  is  still  available  at  these  long  delays.  Therefore,  it  is  also 
available  for  nonselective  transfer.  The  finding  that  nonse- 
iecdve  transfer  reaches  asymptote  at  a  lower  level  for  the  top 
and  bottom  rows  must  then  be  a  consequence  of  a  capaci^ 
limitation.  There  is  a  suggestion  in  the  data  that  the  cumu¬ 
lative  selective  transfer  from  the  top  and  the  bottom  rows  is 
an  S-shaped  function  of  time.  This  would  mean  that  the 
transfer  rate  was  slow  in  the  beginning,  reached  a  maximum 
value  at  intermediate  times,  and  finally  declined  again  to 
zero.  A  slow  start  suggests  a  delayed  shift  of  attention;  the 
slow  final  rate  almost  certainly  in^cates  that  the  iconic  im¬ 
age  has  decayed  to  illegibility. 

The  additivity  assumption  of  Equation  1  that  yielded  se¬ 


lective  transfer  by  subtracting  out  the  nonselective  transfer 
can  be  applied  to  the  row  data  to  obtain  the  selective  transfer 
for  each  individual  row.  The  results  are  shown  in  Figure  17. 
We  keep  parallelism  as  a  working  hypothesis  because  it  ac¬ 
counts  for  99%  of  the  variance  of  the  data  for  both  subjects. 

Manifestations  in  the  data  of  attention  to  a  stimulus  row. 
We  assume  that  partial  attention  to  a  row  slightly  improves 
selective  transfer  of  letters  from  that  row  relative  to  nonse- 
lective  transfer  and  that  complete  attention  to  a  row  maxi¬ 
mally  facilitates  selective  transfer.  Consider  a  graph  of  pro¬ 
portion  correct  venus  mask  delay  with  cue  delay  as  the 
parameter  (Figure  14).  The  earliest  mask  delay  at  which  data 
firom  two  cue  delays,  cj  and  cz  diverge  indicates  the  point  at 
which  the  states  of  attention  induced  by  C|  and  Cz  are  suf¬ 
ficiently  different  to  affect  selective  transfer.  For  example, 
consider  cues  that  indicate  the  bottom  row,  and  suppose 
Cl  =  0  and  cz  =  100  ms.  In  Figure  14,  the  data  for  Ci  =  0 


Figure  16.  Pure  selective  transfer  as  a  function  of  mask  delay  for  each  of  three  stimulus  rows  and 
2  subjects.  (Data  points  are  the  proportion  of  correct  partial  repoits  on  trials  in  which  the  cue 
occurred  at  stimulus  onset  [Figure  6c].  Circles  indicate  the  top  row,  triangles  indicate  the  middle 
row,  and  squares  indicate  the  bottom  row.  The  solid  curves  are  predictions  of  the  nine-parameter 
attentional  model  [Equation  13],  with  parameters  given  in  Table  3.) 
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dau  as  Figure  14 — accuracy  of  partial  reports  as  a  function  of  mask  delay,  with  cue  delay  as  a 
parameter.  Cue  delays  are  in  milliseconds;  Hlled  circle  =  0,  open  circle  ==  100,  filled  square  =  200, 
open  square  300,  triangle  400.  The  estinuited  amount  of  nonselective  transfer  has  been 
subtracted  (as  in  Ingure  11]  to  yield  the  residual  selective  transfer.  The  symbols  show  estimated 
values  of  residual  selective  transfer  after  various  cue  delays.) 


first  break  away  from  the  dau  for  other  cs  when  m  -  100, 
and  the  ct  =  0  dau  are  completely  separate  when  m  =  200. 
A  mask  occurring  100  ms  a^r  C|  means  there  is  no  further 
transfer  from  the  stimulus  after  100  ms.  For  the  dau  obtained 
with  Cl  in  this  condition  to  differ  from  the  other  Ci  implies 
that  the  cue  must  have  acted  to  alur  attention  within  100  ms. 
Alternatively,  we  would  have  to  reject  our  previous  assump¬ 
tion  that  the  mask  terminates  stimulus  availability. 

Figure  14  shows  that,  for  the  middle  row,  there  is  no  clear 
divergence  of  daU  for  different  cue  delays  and  therefore  no 
evidence  that  attention  does  or  does  not  affect  transfer  of  the 
middle  row.  However,  transfer  from  the  top  and  bottom  rows 
is  obviously  quite  affected  by  attention.  The  dau  for  cue 
delay  c  in  Hgure  14  tends  to  break  upward  from  the  pack  of 
longer  cue  delays  as  soon  as  m  ^  c.  This  indicates  that  our 
cues  induce  a  measurable  change  in  attentional  sute  imme¬ 
diately  after  their  occurrence. 

The  other  aspect  of  the  performance-versus-mask-deiay 
daU  (Figure  14)  that  we  have  already  dwelt  on  at  length  is 
the  parallelism  of  the  curves  for  different  cue  delays  onward 
from  the  moment  m  2;  c  (Figure  17).  Parallelism  indicates  that 
the  sute  of  selective  attention  is  the  same  for  ail  the  con¬ 


ditions  represented  in  the  parallel  curve  sections.  In  odier 
words,  not  only  does  attention  switch  quickly  once  the  cue 
arrives,  but  it  switches  completely.  If  it  did  not  switch  all  at 
once,  then  an  early  cue,  C|,  would  have  produced  a  greater 
attentional  shift  to  the  indicated  row  at  a  subsequent  time,  t2 
than  a  cue,  Cz,  that  did  not  occur  until  tz-  In  that  case,  transfer 
measured  at  tz  would  be  faster  for  C)  than  for  cz,  and  the 
parallelism  in  the  daU  of  Figure  17  would  be  violated.  Be¬ 
cause  the  daU  are  effectively  parallel,  we  also  have  to  assume '' 
that  within  the  context  of  our  assumptions,  attention  shifts 
quickly  and  completely.  These  assumptions  are  formalized  in 
the  next  section. 

Attentional  Model  of  Transfer  From  Iconic  Memory 
to  Durable  Storage 

Assumptions 

To  account  for  the  analysis  of  partial-reptHt-plus-nusking 
dau  separately  by  rows,  we  generalize  the  aggregate  model 
of  Figure  12a  in  a  natural  way,  as  illustrated  in  Hgure  18.  In 
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Figurt  18.  Illustration  of  the  attentional  model  of  iconic  memory 
transfer  processes.  (As  in  the  aggregate-row  model,  before  the  cue, 
the  initi^  state  of  attention  determines  nonselective  transfer  from 
iconic  memoty  to  a  durable  store.  Cn.  i.  Cn.  2,  and  Cs.  j,  respec¬ 
tively,  indicate  the  relative  amounts  of  anention  allocated  to  the 
top,  middle,  and  bottom  rows  before  the  cue.  The  row-dependent 
retinotopic  component  of  iconic  transfer  rate  is  illustrated  by  the  p, 
functions,  which  begin  with  stimulus  onset.  In  response  to  a  cue  to 
report  Row  r,  subjects  shift  attention  instantaneously  from  its 
initial  state  to  one  of  the  three  postcue  states  indicated  by  Cs.  The 
actual  transfer  rate  is  the  product  of  Cn.  rFM  before  the  cue  and 
CsPM  afterward.) 


the  aggregate-row  model,  nonselective  transfer  and  selective 
transfer  were  each  characterized  by  a  two  parameters,  their 
rate  and  the  total  capacity.  Now  these  processes  are  made 
more  explicit  in  terms  of  the  states  of  attention  they  represent. 
Each  sute  of  visual  attention  is  characterized  by  a  spatial 
function  that  represents  the  allocation  of  attentional  re¬ 
sources  over  space  (Sperling  &  Weichselgarmer,  1991)  and 
a  separable  temporal  fimction  that  represents  the  time  period 
during  which  die  spatial  function  reigns.  Thus,  nonselective 
transfer  represents  the  default  state  of  attention  that  exists 
from  the  beginning  of  the  trial  until  the  cue  is  teceived  and 
interpreted.  The  spatial  allocation  of  nonselective  attention 
is  described  by  three  numbers  (Cn.  n  r  =  1, ...  3)  that  rep¬ 
resent  the  capacity  (in  letten)  of  durable  storage  allocated  to 
the  top.  mid^e.  and  bottom  rows  of  the  display.  The  single 
numbCT  of  the  aggregate-row  model  that  described  the  ca¬ 
pacity  of  durable  storage  for  nonselectively  transferred  let¬ 
ters  was  Cn  =  2,  Cn.t- 

After  the  cue  is  received  and  interpreted,  one  of  three  states 
of  selective  attention  occurs.  Each  state  is  characterized  by 
C„  the  capacity  allocated  to  the  cued  row  (top,  rruddle,  or 
b^om)  a^  by  0.0  capacity  allocated  to  the  other  rows  (Fig¬ 
ure  18). 

In  the  aggregate-row  model,  the  two  speeds  of  transfer 
from  iconic  memory  to  durable  storage  (nonselective  and 
selective)  were  parameterized  by  their  own  time  constants, 
Ts  and  tn  in  Equation  7.  In  considering  rows  individually,  it 
is  obvious  that  all  transfers  are  much  quicker  from  the  middle 
row  and  that  three  parameters,  a„  r  =  1, ...  3,  are  needed 
to  characterize  the  temporal  differences  between  transfer 
rates  from  the  three  different  rows.  The  exact  temporal  wave¬ 
forms  carmot  be  determined  directly  from  our  data.  A  math¬ 
ematically  tractable  formulation  that  has  useful  properties  is 


given  by  inserting  a  term  (f/r)*^'  to  the  time-dependent  trans¬ 
fer  of  ^uation  7,  resulting  in  the  lime-and-row-dependent 
transfer 

/'(/,  a,)  =  C,(//t)" - '  exp(  - i/T).  (9) 

The  constant  Cr  is  a  capacity  allocated  to  row  r. 

We  assume  that  the  differences  between  nonselective  and 
selective  attention  are  completely  captured  by  the  spatial  al¬ 
locations  of  attentional  capacity  C^,  so  that  one  set  of  row- 
dependent  weights.  Or.  suffices  for  all  states  of  attention. 
Because  the  a, depends  on  spatial  location  (the  row)  and  does 
not  depend  on  attention,  it  represents  the  intrinsic  processing 
efficiency  of  a  retinal  location. 

We  wish  to  test  our  assumption  that  a  single  parameter 
suffices  to  describe  the  overall  transfer  rate  of  b^  selec¬ 
tive  and  nonselective  attention.  Therefore,  in  parameter  es¬ 
timation,  we  estimate  two  overall  rate  parameters,  one  for 
selective  and  one  for  nonselective  attention,  to  determine 
whether  these  unconstrained  rate  estimates  indeed  are 
similar. 


Computational  Model 


Following  Reeves  and  Sperling's  (1986)  attentional  gating 
model,  it  is  reasonable  to  assume  that  the  transfer  rate  from 
a  location,  r,  is  determined  by  the  product  of  two  factors;  (a) 
the  availability  (legibility)  of  stimulus  information  at  r  and 
(b)  the  amount  of  attention  allocated  to  r.  Availability  at  a 
location  is  determined  by  iconic  buildup  and  decay,  and  it  is 
parameterized  by  exponent,  of  Equation  9  combined  with 
the  exponential  terms.  Attentional  allocation  is  parameter¬ 
ized  by  the  capacity  allocation,  C,.  Equation  9  represents  this 
product.  Unfortunately,  transfer  mode  appears  implicitly  in 
the  time  constanL  r,  of  Equation  9.  This  means  that  attention 
allocation  (which  determines  transfer  mode)  would  be  in¬ 
extricably  intertwined  with  iconic  availability  if  Tn  and  ts 
were  to  ^ffer  appreciably. 

Cumulative  transfer  in  an  interval  [0,  m]  is  given  by  in¬ 
tegrating  over  Equation  9.  For  simplicity,  we  change  the  vari¬ 
able  of  integration,  giving 

•5a  ■("«)=  I  C,(r)- -'expf-Odf.  (10) 


By  appropriately  normalizing  the  exponential  terms  in  Equa¬ 
tion  10,  we  can  convert  Equation  10  into  an  attentional  ca¬ 
pacity  (C,  scaled  in  letters)  multiplied  by  the  well-known 
incomplete  gamma  function,  P(a,  jc),  0  ^  P(a,  x)  <  1: 


fi  (ty~'exp(-i)dt 
/J(r)'-'exp(-r)dr ' 


(11) 


In  subsequent  use,  x  will  take  four  values:  c/tn,  cIt%,  mJxN, 
and  m/Ts,  representing  the  intervals  from  exposure  onset 
to  the  cue  and  the  mask,  respectively,  in  units  of  the  time 
constants  of  nonselective  (tn)  and  selective  (ts)  transfer. 
For  a  =  1,  Equation  9  simply  reduces  to  Equation  7.  Values 
of  a  between  0  and  1  lead  to  an  accelerated  expo¬ 
nential  growth  function  for  P(a,  x).  Values  higher  than 
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I  lead  to  S-shaped  delayed  growth. 

Equation  9  plays  the  same  role  in  the  row-by-row  model 
as  Equation  7  did  in  the  aggregate-row  model.  It  can  be 
interpreted  as  representing  the  time  course  of  the  iconic  im¬ 
age  at  each  row  location  weighted  by  attentional  allocation. 
To  help  the  reader's  intuition  in  following  the  exposition  of 
the  computational  model,  we  show  the  iconic  time  course 
functions  for  three  rows  in  Figure  18.  The  instantaneous 
transfer  rate,  p,(r)  =  f""'  e~\  is  the  derivative  of  the  cumu¬ 
lative  transfer,  P(a,  x),  given  in  Equation  11.  As  Figure  18 
indicates,  the  rise  time  for  the  middle  row  is  too  fast  to  be 
observed  in  our  experimental  conditions;  however,  the  mid¬ 
dle  row  decays  with  what  appears  to  be  a  familiar,  mono- 
tonically  decreasing  function.  The  top  and  bottom  rows  rise 
before  they  decay.  We  defer  to  later  the  question  of  whether 
these  top-  and  bottom-row  functions  truly  represent  the  rise 
and  decay  of  iconic  memory.  (Alternatively,  they  might  rep¬ 
resent  a  property  of  a  model  in  which  the  absence  of  inde¬ 
pendent  measurements  of  attention  insufficiently  constrains 
the  partition  of  transfer  rates  into  legibility  and  attentional 
components.) 

The  following  equations  summarize  the  model.  Equation 
12  describes  the  cumulative  nonselective  transfer  that  takes 
place  from  the  onset  of  the  stimulus  until  either  a  cue  or  a 
mask  occurs.  It  is  the  product  of  terms  representing  two  fac¬ 
tors:  attentional  allocation,  C,  and  retinotopic/stimulus  fac¬ 
tors,  P: 

N,,c,m  =  Cf,  ,P(a,.  c'/Tf^),  c'  -  min(c.m).  (12) 

The  function  P  implies  different  transfer  dynamics  for  each 
of  the  three  rows  r.  Relative  to  the  middle  row.  transfer  from 
the  top  and  middle  rows  is  both  delayed  and  slower.  Because 
delay  and  slowness  are  perfectly  correlated,  both  are  cap¬ 
tured  by  the  parameter  Or. 

There  is  a  limit  to  the  total  number  of  letters  that  can  be 
transferred  from  iconic  to  durable  storage.  How  the  subjects 
allocated  space  in  durable  storage  to  particular  rows  so  as  to 
optimize  their  performance  is  a  matter  that  we  did  not  attempt 
to  control.  Therefore,  a  parameter  Cn,  r  is  needed  for  each 
row  to  describe  the  maximum  number  of  letters  of  durable 
storage  allotted  to  it  (i.e.,  the  default  allocation  of  attention 
prior  to  the  cue).  Finally,  the  overall  rate  of  nonselective 
transfer  is  determined  by  the  time  parameter  t^. 

Equation  13  describes  selective  transfer.  In  diis  formula¬ 
tion.  selective  transfer  begins  instantly  at  the  onset  of  the  cue 
and  ends  instantly  at  the  onset  of  the  mask: 

^r.c.m  =  m/T,)  -  P(a,.  c/Tj)],  m>c.  (13) 

The  cumulative  transfer  to  durable  storage  depends  on  the 
integrated  product  of  available  information  and  attention 
(Reeves  &  Sperling,  1986).  Available  information  is  repre¬ 
sented  here  by  P(u,  x),  which,  for  the  special  case  of  tn  = 
Ts  depends  only  on  elapsed  time  since  onset  of  the  stimulus. 
Attention  is  represented  by  the  currently  operative  set  of  Cr 
capacity  values.  Attention  depends  only  on  elapsed  time 
since  the  onset  of  the  cue.  Therefore,  when  Tt4  =  ts,  iconic 
),time  course  and  attention  are  independent. 

Equation  14  expresses  that  the  total  number  of  letters  trans¬ 
ferred  to  durable  storage  from  each  row  r  is  the  sum  of  the 


nonselective  and  selectively  transferred  letters  from  r.  It  gen¬ 
eralizes  Equation  1  of  the  aggregate  model: 

=  A/.c.  +  S,,.,.  (14) 

Parameter  Estimates  and  Their  Interpretation 

Best  fitting  parameters  were  estimated  for  Equations 
12-14  by  means  of  an  optimization  program  (PRAJGS;  see 
Brent,  1973;  Gegenfuitner,  1992),  using  the  paitial-repon- 
plus-masking  dau  of  Experiment  2.  The  results  of  parameter 
estimation  are  summarized  in  Figure  14  and  Table  3.  The 
model’s  predictions  correlate  very  well  with  the  data:  = 
.98  and  .95  for  the  2  subjects.  The  predicted  average  selective 
and  nonselective  transfer  for  the  three  rows  is  almost  iden¬ 
tical  to  the  predictions  of  the  aggregate-row  model  (see  Fig¬ 
ures  9  and  10).  Therefore  the  row-by-row  model  (without 
additional  parameters)  also  predicts  the  data  hrom  the  whole- 
report  experiment  and  from  the  partial-report  experiment 
without  masks. 

The  time  constants  ts  for  selective  transfer  and  for 
nonselective  transfer  are  now  both  approximately  100  ms, 
indicating  that  each  transfer  process  completes  in  about  the 
same  time.  However,  the  acti^  transfer  rates  C^/t  depend  on 
the  row  capacity.  The  aggregate-row  model’s  capacity  for 
nonselective  transfer.  Cm,  is  now  split  up  into  the  three  C^.  A- 
This  set  of  rates  defines  the  initial  default  attention  state  prior 
to  the  cue.  The  high  rate  for  the  middle  row  indicates  that  the 
default  attention  state  is  primarily  focused  on  the  middle  row. 
When  a  cue  is  received  and  interpreted,  attention  is  shifted 
to  the  cued  row,  and  the  transfer  rate  is  determined  by  the 
iconic  legibility  of  that  row,/'(f,  a^)  (Equation  9).  In  fact, 
the  selective  capacity,  Cs,  is  virtually  the  same  as  in  the 
aggregate-row  model  and  nearly  equal  to  the  number  of  let¬ 
ters  in  the  row.  In  effect,  the  model  assumes  that  once  at¬ 
tention  is  shifted  away  from  the  center  row  to  the  top  or 
bottom  row,  it  is  as  effective  at  the  top  or  bottom  row  as  it 
was  in  the  center,  and  any  difference  in  performance  must  be 
accounted  for  by  differences  in  iconic  legibility. 

Parallel  versus  serial  process  in  nonselective  transfer. 


Table  3 

Best  Fining  Parameter  Values  for  the  Model  That 
Takes  Differences  Between  Rows  Into  Account 


Row 

c. 

Cs 

Ts 

Tn 

a 

Top 

3.72 

Subject  BF 
2.87 

115 

82 

1.39 

Middle 

3.72 

3.21 

115 

82 

0.38 

Bottom 

3.72 

1.31 

115 

82 

2.27 

Top 

2.85 

Subject  BL 
0.85 

109 

97 

2.25 

Middle 

2.85 

2.78 

109 

97 

0.50 

Bottom 

2.85 

1.59 

109 

97 

1.97 

Note.  Cs  and  Cs  represent  attentional  capacities,  respectively,  of 
selective  and  nonselective  transfer,  with  units  in  letters;  r,  and  t„ 
represent  lime  constants  of  selective  and  nonselective  transfer, 
with  units  in  milliseconds;  a  is  a  pure  number  (Equation  12)  that 
represents  attentional  dynamics. 
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The  Or  parameters  represent  the  dynamics  of  buildup 
and  decay  of  iconic  legibility  at  the  retinal  locations  r. 
They  represent  the  availability  of  information  from  a 
particular  row  regardless  of  whether  it  has  been  cued.  In 
fact,  some  time  after  stimulus  termination  (100  ms  for 
Subject  BF,  2(X)  ms  for  Subject  BL;  see  Figures  IS  and 
17)  the  slopes  of  the  nonselective  transfer  functions  for 
the  top  and  bottom  rows  in  Figure  IS  are  still  as  steep  or 
again  become  as  steep  as  the  initial  slopes.  This  means  that 
after  100  or  200  ms,  the  availability  of  information  from 
these  rows  is  as  well  as  or  better  than  it  is  immediately  after 
stimulus  termination. 

One  interpretation  of  the  delayed  availability  of  informa¬ 
tion  from  the  top  and  bottom  rows  in  nonselective  transfer 
is  that  iconic  legibility  builds  up  slowly  but  approximately 
simultaneously  in  these  noncentral  rows.  An  alternative  ex¬ 
planation  is  based  on  serial  processes.  Prior  to  a  cue,  subjects 
preprogram  their  attention  to  move  away  from  fixation  at 
about  the  time  they  expect  to  have  completed  transfer  of  the 
middle  row  to  durable  storage.  Then  they  shift  attention  ran¬ 
domly  to  either  the  top  or  bottom  row.  This  would  result  in 
an  apparent  delay  in  the  availability  of  information  from  the 
top  and  bottom  rows.  If  subjects  were  indeed  shifting  atten¬ 
tion  on  nonselective  report  trials,  it  would  greatly  complicate 
the  analysis  of  the  attentive  and  iconic  components  of  per¬ 
formance.  The  present  data  do  not  discriminate  well  between 
these  alternatives. 

Attention  and  the  iconic  time  course  are  inextricably 
bound  by  multiplication  in  Equation  9:  Only  the  product  of 
attentional  allocation  and  iconic  availability  determines  per¬ 
formance.  The  model  is  a  powerful  computational  device,  but 
without  an  independent  verification  of  the  attentional  state 
(or  iconic  availability),  it  is  not  a  sufficiently  precise  tool  to 
dissect  unambiguously  the  attentional  and  iconic  compo¬ 
nents  of  performance. 

A  comparison  of  the  estimated  values  of  Oi  and  <13  in  Table 
3  shows  that  the  2  subjects’  iconic  time  courses  are  different 
in  the  top  and  bottom  rows,  with  BF  favoring  the  top  row  and 
BL  the  bottom  row.  Because  these  effects  occur  in  both  non¬ 
selective  and  selective  anention,  the  model  assigns  them  to 
the  iconic  time  course.  However,  a  more  plausible  interpre¬ 
tation  would  suggest  that  they  represent  tendencies,  or  biases, 
to  shift  attention  up  or  down.  According  to  this  interpretation, 
in  response  to  a  cue.  BF  shifts  his  attention  faster  to  the  top 
row  than  the  bottom  row,  and  for  BL  it  is  just  the  reverse. 
These  biases  in  selective  transfer  mirror  the  subjects’  bias  in 
nonselective  transfer. 

Although  the  model  assumes  that  nonselective  transfer 
reflects  a  single  initial  attentional  state,  closer  examination 
of  the  data  suggests  that  subjects  fint  transfer  letters  from 
the  middle  row  and  then  fill  up  the  remainder  of  durable 
storage  using  nonselective  transfer  from  the  other  rows. 
That  is,  even  nonselective  transfer  ultimately  may  have  to 
be  modeled  as  consisting  of  two  or  more  attentional  states, 
an  initial  state  of  attention  to  the  middle  row,  followed  by 
attention  to  either  the  top  or  bottom  row.  Although  this 
precision  of  description  is  necessary  for  the  accurate  parti¬ 
tion  of  the  components  of  performance  (iconic  decay,  at¬ 
tention),  it  is  not  necessary  from  a  purely  computational 


point  of  view.  The  model  accounts  nicely  for  all  the  enig¬ 
mas  that  remained  after  the  aggregate-row  model  and  pro¬ 
vides  a  framework  for  dealing  with  the  few  problems  that 
remain. 


Cieneral  Discussion 

The  present  data  show  the  critical  importance  of  nonse¬ 
lective  transfer  in  iconic  memory  experiments.  By  decom¬ 
posing  performance  into  selective  and  nonselective  transfer 
and  subtracting  nonselective  transfer  from  the  total  transfer, 
we  were  able  to  isolate  the  selective  component  that  depends 
on  the  stimulus  decay  and  attentional  shifts.  This  isolation  of 
the  two  transfer  processes  was  made  possible  by  using  a 
completely  crossed  design  of  cue  delay  and  mask  delay.  This 
crossed  design  differs  from  previous  investigations  with 
poststimulus  masks  (e.g.,  Averbach  A  Coriell.  1961 :  Irwin  A 
Brown,  1987),  in  which  only  one  mask  or  cue  delay  was  used 
or  cue  and  mask  delay  were  correlated. 

With  respect  to  theory,  we  consider  the  three  previous 
computational  treatments  of  information  transfer  from 
iconic  memory  to  durable  storage.  The  earliest  model 
(Averbach  A  (Toriell,  1961)  is  extremely  simple  because  it 
was  developed  for  a  more  restricted  paradigiiL  It  proposes 
both  a  selective  and  a  nonselective  transfer  process,  but  it 
embodies  an  assumption  about  probabilistic  independence 
between  these  two  processes  that  is  strongly  contradicted 
in  our  larger  dau  set  Rumelhart’s  (1970)  mode)  is  quite 
similar  to  ours.  It  fails  because  it  embodies  an  incorrect  as¬ 
sumption  about  subjects’  strategies  and  another  about 
memory  capacity  limits.  These  two  models,  and  ours,  share 
the  common  theme  of  two  transfer  processes.  The  third 
model  (Loftus  et  al..  1985)  derives  iconic  decay  properties 
from  a  single  nonselective  transfer  process.  With  respect  to 
nonselective  trarufer  and  iconic  decay,  there  is  consider¬ 
able  agreement  between  Loftus  et  al.’$  theory  and  ours,  al¬ 
though  their  theory  is  not  intended  to  confront  the  two 
transfer  process  issues  that  are  our  primary  concern.  In  the 
next  three  sections,  we  consider  these  models  in  more  de¬ 
tail.  Then  we  briefly  review  noncomputadonal  suggestions 
about  iconic  transfer  processes. 

Probabilistic  Independence  of  Nonselective 
and  Selective  Transfer 

Averbach  and  Coriell  (1961)  did  a  partial-report  experi¬ 
ment  in  which  the  stimulus  was  two  rows  of  8  leners  and 
the  required  partial  report  was  a  single  letter.  isual  cue 
(“bar  marker”)  appeal  above  or  below  the  req  'red  let¬ 
ter.  Total  transfer  was  determined  in  a  partial-t,.in>rt  exper¬ 
iment  with  a  cue  to  report  1  of  16  possible  letters.  Nonse¬ 
lective  transfer  was  estimated  from  report  accuracy  when 
the  cued  letter  was  masked  with  a  concentric  annulus.  Se¬ 
lective  transfer  was  estimated  by  correcting  total  transfer 
for  the  nonselective  component  Because  Averbach  and 
Coriell’s  annulus  was  an  effective  letter  masker  only  when 
the  annulus  occurred  after  a  letter,  and  not  when  it  oc¬ 
curred  simultaneously,  they  ignored  the  data  of  the  initial 
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parts  of  their  masking  curves.  However,  their  experimental 
results  are  generally  similar  to  ours,  even  though  the  ex¬ 
perimental  conditions  are  quite  different.  Overall,  perfor¬ 
mance  was  higher  for  our  subjects. 

Averbach  and  Coriell  (1961)  proposed  the  following 
combination  rule.  Their  basic  unit  of  analysis  was  a  single 
letter,  which  could  be  transferred  either  selectively  or  non- 
selectively.  They  regarded  the  two  transfer  types  as  inde¬ 
pendent  processes.  Both  transfers  contribute  probabilisti¬ 
cally  to  the  proportion  of  correctly  reported  letters,  much 
as  in  Rumelhart’s  (1970)  model.  Averbach  and  Coriell 
found  huge  performance  differences  for  different  letter  po¬ 
sitions  but  decided  to  ignore  them  and  average  their  data. 
Moreover,  they  did  not  vary  mask  and  cue  delays  indepen¬ 
dently,  so  they  were  severely  limited  in  what  they  were 
able  to  do  with  their  data  and  theory.  For  example,  they 
were  noncommittal  about  whether  nonselective  transfer 
ends  when  the  cue  occurs,  about  whether  selective  transfer 
begins  immediately  upon  cue  onset,  and  about  other  issues 
related  to  the  underlying  processes. 

Averbach  and  Coriell’s  (1961)  model  is  analyzed  as  fol¬ 
lows.  Each  letter  has  a  certain  probability  of  being  transferred 
by  either  process.  Denote  the  event  of  a  nonselective  transfer 
with  N,  the  event  of  a  selective  transfer  with  5,  and  the  event 
of  any  kind  of  transfer  with  T: 

P{T)  =  Pif/)  + (I-  PmP(S).  (15) 

It  then  follows  that  selective  transfer  is  given  by 

P(S)  =  (P(T)  -  P(N))/(1  -  pm.  (16) 

Equation  16  expresses  the  idea  that  two  processes  contribute 
to  partial-report  accuracy,  as  does  our  Equation  2  (and  its 
subsequent  elaborations)  and  Rumelhart’s  ( 1970)  model  (dis¬ 
cussed  shortly). 

Figure  19  compares  the  rates  of  selective  transfer  (i.e.,  the 
legibility  of  the  iconic  image)  as  derived  from  the  present 
model  and  from  Averach  and  CorielTs  (1961)  m^ei.  It 
shows  the  number  of  letters  selectively  transferred  during 
successive  100-ms  intervals  plotted  as  a  function  pf  the  time 
at  the  end  of  the  interval.  The  data  points  in  Figure  19a  are 
derived  from  Equation  2.  Figure  11  shows  cumulative 
selective  transfer;  Figure  19  shows  selective  transfer  rate, 
that  is,  successive  differences  between  the  points  of  the  lines 
in  Figure  1 1.  That  all  these  successive  differences  fall  on  the 
same  iconic  decay  function  should  be  no  surprise.  We  pre¬ 
viously  noted  that  all  the  curves  of  Figure  1 1  derive  from  a 
single  generic  selective  transfer  function. 

Figure  19b  shows  the  predictions  for  Averbach  and  Co- 
riell's  (1961)  formulation  (assuming  that  nonselective  trans¬ 
fer  stops  after  the  occurrence  of  the  cue).  When  the  cue  delay 
is  zero,  there  is  no  nonselective  transfer,  and  both  our  model 
and  theirs  give  the  same  predictions  (indicated  by  the  filled 
circles).  However,  when  nonselective  and  selective  transfer 
are  combined  (i.e.,  for  any  cue  delay  greater  than  zero),  the 
models  differ.  As  already  pointed  out  in  the  discussion  of 
Figure  1 1,  our  assumptions  result  in  selective  transfer  rates 
that  depend  only  on  the  time  since  the  onset  of  the  cue. 
Averbach  and  Coriell's  model  leads  to  large,  highly  irregular 
estimates  of  selective  transfer  for  a  given  time  interval,  and 


no  clear  pattern  emerges  of  how  selective  uansfer  is  deter¬ 
mined.  Their  model  cannot  account  for  our  data. 

The  consistency  of  the  different  independent  estimates  of 
the  iconic  decay  function  demonstrates  the  value  of  our  Cue 
Delay  X  Masking  Delay  crossed  design,  which  enables  us 
not  only  to  estimate  the  parameters  for  our  model,  but  also 
to  check  our  model's  consistency. 

Diffuse  Transfer  Followed  by  Focused  Transfer 

Rumelhart  (1970)  proposed  a  mathematical  model  of 
partial-report  experiments  cast  in  terms  of  features.  Features 
were  transferred  with  replacement  from  retinal  locations  and 
aggregated  to  form  letters.  During  the  stimulus  exposure, 
features  were  equally  available  at  all  locations  and  all  times. 
After  termination  of  the  exposure,  feature  availability  was 
assumed  to  decay  exponentially.  The  feature  extraction  rate 
was  assumed  to  have  an  absolute  limit  (capacity).  Before  a 
cue  was  received,  the  overall  feature  extraction  capacity  was 
spread  equally  over  ail  locations.  Immediately  after  a  cue  was 
received,  feature  extraction  capacity  was  concentrated  en¬ 
tirely  on  the  cued  locations. 

The  essential  ideas  of  Rumelhait's  (1970)  model  are  quite 
similar  to  those  of  our  model,  namely,  that  there  is  a  default 
precue  attentional  state  followed  by  a  postcue  attentional 
state  and  that  the  same  transfer  process  operates  in  both  states 
(merely  the  row  allocations  are  different).  However,  Rumel¬ 
hart  was  unaware  that  the  precue  state  is  not  diffusely  spread 
over  all  rows  but  is  concentrated  on  the  middle  row.  In  ad¬ 
dition,  he  had  no  explicit  capacity  limit  for  durable  storage, 
relying  on  limited  stimulus  availability  to  account  for  all 
response  limitations.  This  was  obviously  too  restrictive  an 
assumption. 

Rumelhait’s  (1970)  representation  of  the  probability  of 
correct  reports,  P,  as  the  indirect  result  of  a  feature  extraction 
process  would  allow  the  P  versus  time  graphs  either  to  grow 
like  exponentially  limited  growth  processes  or  to  assume  S 
shapes.  An  S  shape  would  result  from  the  fact  that  before  a 
threshold  number  of  features  is  collected  at  a  location,  the 
probability  of  correctly  reporting  the  letter  at  that  location  is 
assumed  to  be  at  chance.  After  the  critical  number  of  features 
is  collected,  the  probability  of  correct  report  is  assumed  to 
be  1.0.  Although  there  is  considerable  flexibility  in  the  gen¬ 
eration  of  S-shaped  curves  under  the  feature  accumulation 
assumption,  and  our  empirical  P  versus  t  curves  are,  in  a  few 
cases',  S  shaped,  it  seemed  better  not  to  burden  our  transfer 
theory  with  such  a  complex  assumption. 

How  Much  Is  an  Icon  Worth? 

The  nonselective  transfer  curves  obtained  in  our  experi¬ 
ments  appear  very  similar  to  the  ones  derived  by  Loftus  et 
al.  (1985)  in  a  paradigm  using  pictorial  stimuli.  They  mea¬ 
sured  the  number  of  details  subjects  could  report  from  briefly 
exposed  pictures.  Exposure  duration  was  varied,  and  a  mask 
followed  stimulus  presentation  immediately  after  stimulus 
offset.  In  a  second  condition,  presentation  of  the  mask  was 
delayed  3(X)  ms.  They  found  that  a  30()-ms  mask  delay  after 
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Figure  19.  Derived  iconic  memory  decay  fiinctions:  Estimates  of  the  tale  of  selective  transfer  at 
a  given  time  after  onset  of  the  50-ms  stimulus.  (At  each  time  t,  selective  transfer  during  the  100  ms 
preceding  t  is  estimated  independently  from  each  condition,  with  cue  delay  less  than  t.  [This  requires 
extraction  of  the  selective  transfer  component  from  total  transfer  whenever  cue  delays  are  greater 
than  0.]  Symbols  indicate  cue  delays  in  milliseconds;  Filled  circle  »  0.  open  circle  <■  100,  filled 
square  »  200,  open  square  »  SOO.'triangle  *  400.  Dau  are  shown  for  Subjects  BF  [left]  and  BL 
[right].  Panel  a  shows  estimates  of  selective  transfer  derived  from  our  model  [Equation  5].  Dau 
points  are  the  differences  between  successive  points  on  each  of  the  lines  of  Figure  II.  Insofar  as  the 
different  estimates  all  fall  on  the  same  iconic  decay  function,  it  substantiates  our  model  of  iconic 
decay.  Panel  b  shows  estimates  of  selective  transfer  derived  from  Averbach  and  Coriell’s  [1961] 
model  (Equation  16]  applied  to  our  data.  The  wide  variation  [at  a  given  time]  of  the  different 
estimates  for  selective  transfer  indicates  that  this  model  does  not  yield  a  consistent  description  of 
iconic  decay.) 


the  terminatioii  of  a  stimulus  exposure  led  to  the  same  levels 
of  petfoimance  as  an  additional  100-ms  stimulus  exposure. 
They  argued  that  an  additiona]  exposure  of  100  ms  is  equiv¬ 
alent  to  an  icon  that  is  available  for  3(X)  ms.  In  this  arxl 
subsequent  experinwnts  (Loftus,  Duncan,  &  Gehrig.  1992; 
Loftus  &  Hogden,  1988),  with  various  stimulus  materials  and 
tasks,  they  equilibrated  iconic  availability  against  an  equiv¬ 
alent  continu^  exposure  that  yieltted  the  same  performance. 
Ultimately,  Loftus  et  al.  (1992)  derived  an  iconic  decay  iunc^ 
tion  in  terms  of  equivalent  ctmtinuation  of  the  stimulus.  Their 
derived  iconic  d^y  functions  were  approximately,  but  not 
precisely,  exponential. 

The  assumptions  underlying  Loftus  et  al.’s  (1992)  and  our 
analyses  are  quite  similar,  ^though  they  deriv^  all  their  dau 


from  whole  reports.  The  main  difTerence  is  that  we  present 
iconic  decay  directly  in  terms  of  a  transfer  rate;  whereas  they 
presented  it  in  terms  of  the  fraction  of  the  transfer  rate  of  a 
continued  stimulus  exposure.  Furthermore,  in  their  proce¬ 
dures,  it  apparently  was  not  necessary  to  discriminate  the 
transfer  rate  at  different  retinal  locations,  which  is  critical  in 
our  analyses.  Their  derived  iconic  decay  functions  agree 
quite  well  with  those  we  dmive  for  the  middle  row. 

Position  Effects 

Differences  in  performance  for  different  parts  of  the  dis¬ 
play  have  long  bm  observed  (Averbach  &  Coriell,  1961; 
Holding,  1970;  Sperling,  1960);  tnit  have  not  been  taken  into 
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account  in  the  estimation  of  the  duration  of  iconic  memory. 
The  differences  we  observe  are  mainly  lower  transfer  rates 
for  the  top  and  bottoms  compared  with  the  middle  row.  Al¬ 
though  in  our  formulation  these  locational  factors  are  tied  to 
the  stimulus,  it  is  likely  that  they  are  based,  at  least  in  part, 
on  attentional  factors.  A  relevant  positional  analysis  was 
done  by  Holding  (1970).  He  varied  the  probability  with 
which  each  row  was  cued  and  found  that  performance  varied 
accordingly,  implicating  attention.  However,  Holding's  anal¬ 
ysis  was  insufficient  to  discriminate  a  change  in  precue  non- 
selective  strategy  from  (postcue)  difference  in  iconic  decay. 
Furthermore,  like  many  others  (see  Long,  1980),  we  strongly 
disagree  with  Holding's  conclusion  that  this  observation  can 
explain  partial-report  superiority  without  postulating  an  in¬ 
termediate  store. 

When  a  rapidly  moving  spot  is  illuminated  by  stroboscopic 
flashes  (temporally  sampled  motion),  more  than  one  spot 
appears  to  move  simultaneously.  The  number  of  apparently 
visible  spots  is  a  measure  of  visible  persistence,  and  this 
number  varies  with  retinal  location,  persistence  being  longer 
for  peripheral  than  for  foveal  stimuli  (Farrell,  Pavel,  &  Sper¬ 
ling,  19^).  Eventually,  such  spatial  nonhomogeneities  of  the 
visual  system  will  have  to  be  reflected  in  accounts  of  iconic 
decay. 

Strategy 

The  results  obtained  in  Experiment  I  seem  to  contradict 
earlier  results  by  Sperling  (1960)  that  showed  an  influence 
of  subjects’  strategy.  In  Experiment  1,  performance  for  a 
given  cue  delay  varied  depending  on  which  cue  delays  were 
given  in  preceding  sessions.  We  can  resolve  this  contradic¬ 
tion  by  looking  at  subjects'  overall  performance  level.  Our 
subjects  were  well  practiced.  Under  ideal  conditions  (no 
mask  or  cue  delay),  they  achieved  a  performance  level  of 
95-100%  correct.  Subjects  in  Sperling’s  study  achieved  70- 
95%  correct  under  the  same  conditions.  This  suggests  that 
these  early  strategy  effects  were  due  to  the  use  of  nonoptima) 
strategies  that  would  have  been  discarded  after  additional 
practice. 

Other  Models 

In  recent  studies,  Irwin  and  Brown  (1987)  and  Irwin  and 
Yeomans  (1986)  tested  an  alternative  conception  of  iconic 
memory.  Their  theory  also  has  two  buffers,  analogous  to  the 
iconic  memory  and  short-term  memory  concepts  of  tradi¬ 
tional  theories.  It  assumes  that  the  coding  in  iconic  memory 
has  two  separate  representations,  one  for  identity  and  another 
for  spatial  location.  This  distinction  does  not  bear  directly  on 
the  distinction  between  nonselective  and  selective  transfer, 
but  it  is  relevant  to  the  general  issue. 

We  did  not  analyze  our  data  for  errors  of  intrusion  and 
location,  but  we  have  some  important  relevant  observations 
from  serving  as  subjects  ourselves  and  from  speaking  to  sub¬ 
jects  in  iconic  memory  experiments.  When  a  subject  happens 
to  be  attending  to  one  row  while  another  one  is  cued  (and  fails 
to  perceive  the  cued  letters),  the  tendency  is  not  to  guess  at 


random,  but  to  report  the  (nonselectively)  transferred  letters 
even  though  they  are  known  to  be  in  the  wrong  row.  To  a 
subject,  it  seems  better  to  report  letters  at  least  known  to  have 
been  somewhere  in  the  stimulus  than  to  report  random  letters, 
the  reasoning  perhaps  being  that  the  cue  or  the  rows  may  have 
been  misperceived.  Therefore,  in  assessing  location  enors,  it 
is  critical  to  use  additional  measures  to  assess  the  nature  of 
the  errors.  For  example.  Sperling  and  Dosher  (1986)  noted 
that  when  items  were  reported  with  high  confidence,  location 
errors  were  extremely  rare  and  practically  never  extended 
beyond  an  adjacent  location.  Irwin  and  Yeomans  (1986)  sup¬ 
ported  the  notion  of  row  juxtaposition.  In  their  study,  incor¬ 
rect  letters  mostly  came  from  an  incorrect  row  in  the  correct 
column  of  the  display. 

Summary  and  Conclusion 

We  experimentally  identified  two  transfer  processes,  non¬ 
selective  and  selective,  in  the  partial-report  task.  Our  data 
provided  strong  evidence  that  performance  in  the  partial- 
report  task  is  given  by  the  algebraic  sum  of  these  two  pro¬ 
cesses.  Experiinent  1  showed  that  independent  of  cue  delay, 
subjects  use  only  one  strategy  in  a  partial-report  experiment. 
Experiment  2  showed  that  this  strategy  consists  of  nonse¬ 
lectively  transferring  letters  until  the  cue  appears  and  after¬ 
wards  selectively  transferring  them. 

The  many  complexities  of  these  experiments  are  accu¬ 
rately  described  by  a  computational  model  that  makes  several 
plausible  assumptions.  Transfer  rates  are  determined  by  the 
product  of  iconic  legibility  of  the  stimulus  (which  depends 
on  the  elapsed  time  after  stimulus  exposure  and  on  the  retinal 
location)  and  the  subject’s  attentional  state.  Nonselective 
transfer  is  characterized  by  rapid  transfer  of  the  middle  row 
and  much  slower  transfer  of  other  rows.  This  precue  atten- 
tional  state  is  parameterized  in  the  computational  model  by 
the  precue  capacity  allocations  weighted  heavily  toward  the 
middle  row. 

Immediately  after  the  cue,  attention  shifts  to  the  cued  row 
of  the  display.  Postcue  capacity  allocation  is  maximum  for 
the  cued  row  and  zero  for  the  others.  From  this  moment  on. 
until  the  poststimulus  mask  ends  all  iconic  transfer,  selective 
transfer  occurs  from  the  cued  row.  Nonselective  transfer  is 
focused  mainly  on  the  middle  row,. whereas  selective  transfer 
focuses  exclusively  on  the  cued  row,  so  that  selective  transfer 
produces  more  conect  items  on  the  average — a  higher  ef¬ 
fective  transfer  rate.  However,  empirically  determined  rate 
constants  (completion  times)  for  nonselective  and  selective 
transfers  are  approximately  the  same  (r  "•  100  ms),  sug¬ 
gesting  that  all  transfers  represent  the  same  process  and  the 
different  effective  rates  reflect  different  states  of  attention, 
different  retinal  locations,  and  different  likelihoods  that  the 
transferred  items  will  be  in  the  cued  row. 
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We  examine  apparent  motion  carried  by  textural  properties.  The  texture  stimuli  consist  of  a  sequence 
of  grating  patches  of  various  spatial  frequencies  and  amplitudes.  Phases  are  randomized  between 
frames  to  insure  that  first-order  motion  mechanisms  directly  applied  to  stimulus  luminance  are  not 
systematically  engaged.  We  use  ambiguous  apparent  motion  displays  in  which  a  heterogeneous  motion 
path  defined  by  alternating  patches  of  texture  s  (standard)  and  texture  v  (variable)  competes  with  a 
homogeneous  motion  path  defined  solely  by  patches  of  texture  s.  Our  results  support  a  one-dimensional 
(single-channel)  model  of  motion-from-texture  in  which  motion  strength  is  computed  from  a  single 
spatial  transformation  of  the  stimulus — an  activity  transformation.  The  value  assigned  to  a  point  in 
space-time  by  this  activity  transformation  is  directly  proportional  to  the  modulation  amplitude  of  the 
local  texture  and  inversely  proportional  to  local  spatial  frequency  (within  the  range  of  spatial 
frequencies  examined).  The  activity  transformation  is  modeled  as  the  rectified  output  of  a  low-pass 
spatial  filter  applied  to  stimulus  contrast.  Our  data  further  suggest  that  the  strength  of  texture-defined 
motion  between  a  patch  of  texture  s  and  a  patch  of  texture  v  is  proportional  to  the  product  of  the 
activities  of  s  and  v.  A  strongly  counterintuitive  prediction  of  this  model  borne  out  in  our  data  is  that 
motion  between  patches  of  different  texture  can  be  stronger  than  motion  between  patches  of  similar 
texture  (e.g.  motion  between  patches  of  a  low  contrast,  low  frequency  texture  I  and  patches  of  high 
contrast,  high  frequency  texture  h  can  be  stronger  than  motion  between  patches  of  similar  texture  h). 

Second-order  motion  Motion  metamers  Motion  energy  Motion  correspondence 


INTRODUCTION 
First-order  motion  extraction 

Drifting  spatiotemporal  modulations  of  various  sorts  of 
optical  stuff  (such  as  luminance,  contrast,  texture,  bin¬ 
ocular  disparity,  etc.)  can  induce  vivid  motion  percepts; 
in  each  case  “something”  appears  to  move  from  one 
place  to  another.  This  introspective  description,  how¬ 
ever,  does  not  necessarily  reflect  the  underlying  processes 
in  human  visual  motion  processing. 

The  study  of  visual  motion  extraction  mechanisms  has 
traditionally  focused  on  rigidly  moving  objects,  project¬ 
ing  drifting  modulations  of  luminance.  Several  physio¬ 
logically  plausible  computational  models  have  been 
proposed  to  extract  motion  information  from  drifting 
luminance  modulations.  Examples  are  the  gradient 
detector  (see  Moulden  &  Begg,  1986)  and  the  Reichardt 
or  correlator  detector  (see  Reichardt,  1961).  These 
detectors  are  designed  to  detect  drifting  luminance 
modulations  (or  their  linear  transformations)  and  are 


'Department  of  Psychology  and  Center  for  Neural  Scicocc,  New  York 
Univeniiy.  New  York,  NY  10003,  U.S.A. 
tPresent  address:  Utrecht  Biophysics  Research  Institute  (UBI),  Buys 
Ballot  Laboratory,  Utrecht  University,  Ptincetonplein  S.  3584  CC 
Utrecht.  The  Netherlands. 

tPresent  addresr  Department  of  Psychology,  Rutgers  University.  New 
Brunswick.  NJ  08903,  U.S.A. 


therefore  called  first-order  motion  extraction  mechan¬ 
isms  (Cavanagh  &  Mather,  1989) 

Psychophysical  experiments  (e.g.  van  Santen  &  Sper¬ 
ling,  1984;  Werkhoven,  Snippe  &  Koenderink,  19^b) 
have  shown  that  motion  perception  of  drifting  modu¬ 
lations  of  luminance  is  well  explained  by  a  first-order 
computation  called  motion  energy  extraction.  Indeed, 
most  current  models  of  first-order  motion  detection 
(e.g.  Reichardt  detecton  and  gradient  detectors)  have 
now  been  shown  to  be  equivalent  or  approximately 
equivalent  to  some  variant  of  motion  energy  extraction 
(Adelson  Bergen,  1986;  van  Santen  &  Sperling,  198S). 
A  standard  approach  to  first-order  motion  energy  ex¬ 
traction  (e.g.  Heeger,  1987;  Adelson  &  Bergen,  1985) 
proposes  that  the  visual  system  uses  a  battery  of  spatio- 
temporally  oriented  filters,  each  of  which  yields  a  real- 
valued  function  of  the  visual  field  over  time.  The  output 
of  each  filter  is  squared  at  each  location  in  space  to 
obtain  a  measure  of  local  energy  at  the  spatiotemporal 
frequency  to  which  that  filter  is  tuned.  The  squared 
outputs  of  these  filters  (motion  energies)  comprise  the 
input  to  a  higher  order  process  that  computes  a  velocity 
flow  field.  For  example,  Heeger’s  (1987)  model  is  built  on 
the  observation  that  the  Fourier  transform  of  a  rigidly 
translating  pattern  has  all  its  energy  contained  in  a  plane 
through  the  origin  in  frequency  space.  Each  motion 
energy  detector  (narrow-band,  spatiotemporal  linear 
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filter  followed  by  squaring)  has  its  energy  confined  to  a 
Gaussian  neighborhood  of  frequency  space  near  the 
origin.  The  velocity  vector  assigned  a  given  point  in 
space  at  a  given  time  is  obtained  by  (i)  wei^ting  the 
energy  spectrum  of  each  detector  by  that  detector’s 
response,  and  (ii)  finding  the  plane  through  the  origin  of 
frequency  space  that  absorbs  the  greatest  amount  of  this 
locally  measured  motion  energy. 

Second -order  motion  extraction 

Chubb  and  Sperling  (1988,  1989a,  b,  1991)  demon¬ 
strated  broad  classes  of  drift -balanced  ind  microbalanced 
stimuli  that  clearly  appeared  to  move  but  for  which  even 
complete  knowledge  of  the  energy  of  all  their  Fourier 
components  would  be  useless  in  deciding  whether  their 
motion  was  to  the  left  or  to  the  right  (see  also  Cavanagh, 
Arguin  &  von  Grunau,  1989;  Lelkens  &  Koenderink, 
1984;  Mather,  1991;  Ramachandran,  Rau  &  Vidyasagar, 
1973;  Turano  &  Pantle,  1989;  Victor  &  Conte,  1990). 
Thus  first-order  motion  energy  extraction  fails  com¬ 
pletely  to  account  for  the  perception  of  motion  in 
drift-balanced  stimuli.  Such  stimuli  arc  said  to  elicit 
second-order  motion  perception  (Cavanagh  &  Mather, 
1989;  Chubb  &  Sperling,  1988).  In  second-order  motion 
stimuli,  what  drifts  is  not  a  luminance  modulation  but 
modulation  of  contrast,  or  spatial  frequency,  texture 
type,  flicker,  or  some  other  stimulus  property. 

Stages.  Let  L  be  the  spatiotemporal  luminance  func¬ 
tion  defining  a  stimulus.  The  luminance  at  point  (.t,y) 
at  time  t  is  then  denoted  L(x,y,  t).  In  our  analysis,  we 
discriminate  three  stages  for  the  extraction  of  motion 
information  from  L :  preprocessing;  flow  field  extraction; 
and  decision. 

First,  a  preprocessing  stage  in  which  one  or  more 
transformations  Tj  are  applied  to  L  yielding  a  set  of 
real-valued,  time-varying,  “neural  images”  TfL)  (Rob¬ 
son.  1980).  The  value  at  point  (x,y)  at  time  t  that  results 
from  applying  7^  to  L  is  thus  denoted  Ti{L)ix,  y,  t). 
Usually,  we  think  of  i  as  referring  to  the  dominant 
spatial  frequency  of  a  transformation — its  scale. 

Second,  each  time-varying,  neural  image  TfL)  is  the 
input  to  a  motion-analysis  stage  V,  whose  output  is  a 
(time-varying)  velocity  flow  field  V,  O  TfL)  =  V\Ti{L)]. 
For  any  point  (x,y)  in  the  visual  field  and  every  time 
f,  the  value  y,  O  Ti(L)(x,  y,  t)  is  a  two-dimensional 
vector  that  indicates  estimated  pattern  velocity  of  the 
transformed  image  rXL)  in  the  neighborhood  of 
(x,y)  at  time  t.  The  scale  of  K,  corresponds  to  T/. 
Associated  with  P'  O  TfL)  is  a  real-valued  function 
S,(L)  that  gauges  the  reliability  or  strength  of  the 
velocity  estimate  provided  by  F,  O  TXL).  For  instance, 
the  velocity  estimate  obtained  at  point  (x,y)  and  time  r 
may  have  been  computed  from  sparse  or  noisy  data. 
In  this  case,  irrespective  of  estimated  direction  or  speed, 
tjie  strength  Si(L)(x,y,t)  of,  the  estimated  velocity 
P’t  O  Ti(L)(x,y,  t)  may  be  low. 

Finally,  all  the  velocity  flow  fields  K,  O  T,{L)  and 
their  associated  strength  maps  SfL)  feed  into  a  decision 
mechanism;  its  output  determines  the  direction  of  appar¬ 
ent  motion  in  ambiguous  displays. 


The  preprocessing  transformation  T,  can  be  either 
linear  or  nonlinear.  Generalizing  previous  terminology, 
we  say  that  any  system  that  employs  linear  preprocessing 
performs  first-order  motion  extraction,  whereas  nonlin¬ 
ear  preprocessing  performs  second-order  motion  extrac¬ 
tion  (e.g.  Cavanagn  et  al.,  1989;  Chubb  &  Sperling, 
1988). 

We  refer  to  the  transformations  K,  O  T,  as  motion 
channels.  T,  is  called  the  initial  transformation  and  F,  the 
motion  extractor.  5,  is  called  the  strength  measure  of  the 
channel. 

Motion -energy  detection  vs  motion -correspondence  de¬ 
tection.  Both  first-  and  second-order  motion  channels 
can  be  further  classified  by  the  type  of  motion  extraction 
they  use.  A  review  of  the  literature  on  motion  perception 
shows  that  two  types  of  motion  extractor  have  been 
considered  and  tested  experimentally.  We  call  these  types 
of  motion  extraction  motion  energy  extraction  and 
motion  correspondence  extraction. 

Motion  energy  extraction  computes  the  directional 
energy  of  a  Fourier  representation  of  the  drifting  modu¬ 
lation  signal,  that  is,  the  relative  energy  of  “drifting” 
spectral  components.  Within  the  constraints  set  by  fre¬ 
quency  resolution,  energy  extraction  is  independent  of 
the  relative  phase  of  the  different  spatial  Fourier  com¬ 
ponents  of  the  modulation  signal  (van  Santen  &  Sper¬ 
ling,  1984).  In  this  respect,  motion  energy  extraction 
computations  are  largely  insensitive  to  similarities  be¬ 
tween  items  in  a  motion  path.  The  first-order  motion 
analysis  models  noted  above  (Reichardt,  1961;  Adelson 
&  Bergen,  1985;  Mart  &  Ullman,  1981)  all  share  this 
property. 

Traditionally,  however,  psychophysicists  have  inter¬ 
preted  results  of  a  wide  range  of  motion  experiments  in 
terms  of  correspondence  extraction.  The  metaphor  of 
correspondence  extraction  describes  motion  as  the  con¬ 
vection  of  some  invariant  aspects  of  spatial  structure 
over  time.  Thus,  motion  correspondence  extraction  de¬ 
pends  on  similarity  of  local  features.  The  more  nearly 
similar  are  two  adjacent  features  that  are  separated  by 
an  interval  in  time,  the  greater  will  be  the  strength  of 
motion  between  them. 

The  distinction  between  motion  energy  extraction  and 
motion  correspondence  extraction  can  be  summarized  as 
follows:  let  a  and  fi  be  two  points  separated  by  a  brief 
interval  in  space  and  time,  and  let  i;,  and  Vf  be  the 
stimulus  intensities  at  a  and  p.  Then  motion  energy 
extraction  yields  a  motion  strength  that  is  a  monotoni- 
cally  increasing  function  of  the  product  r.t'f.  Motion 
correspondence  extraction  yields  a  motion  strength  be¬ 
tween  2  and  P  that  is  a  decreasing,  nonnegative  function 
of  |p,-i>|. 

Typically,  motion  channels  using  correspondence  ex¬ 
traction  yield  higher  motion  strengths  between  similar 
textures  than  between  dissimilar  textures.  In  particular, 
a  motion  channel  using  a  correspondence  extractor  can 
never  yield  motion  strength  between  a  patch  of  optical 
stuff  A  and  a  patch  of  different  stuff  B  that  is  greater 
than  the  motion  strength  between  two  patches  of  stuff  A. 
This  can  easily  happen,  however,  for  motion  channels 
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using  energy  extractors.  Suppose,  for  instance,  that 
fof  t’A  ^he  respective  values  assigned  stuff 
A  and  stuff  B  by  a  channel's  initial  transformation. 
Then,  motion  energy  extraction  yields  greater  strength  of 
motion  between  a  patch  of  A  and  a  patch  of  B  (IaI*) 
than  between  two  patches  of  B  (ibIb)- 

Motion  -from 'texture 

The  purpose  of  this  paper  is  to  characterize  the 
mechanism  of  second-order  motion  perception  in  the 
subclass  of  drift-balanced  stimuli  for  which  motion  is 
defined  by  a  modulation  of  spatial  texture  properties.  To 
reiterate,  it  is  not  produced  by  a  moving  texture  patch — 
that  would  be  rigid,  luminance-defined  motion.  Texture- 
defined  motion  is  most  conveniently  produced  by  a 
moving  patch  that  is  filled  with  a  particular  type  of 
texture  in  which  each  successive  frame  represents  a  new, 
ur.correlated  instance  of  that  texture  type  (Chubb  & 
Sperling,  1989a,  1991).  As  is  true  for  all  drift-balanced 
motion  stimuli,  an  intriguing  aspect  of  texture-defined 
motion  perception  is  that  (unlike  perception  of  lumi¬ 
nance  defined  or  first-order  motion)  it  cannot  be  ex¬ 
plained  by  Fourier  energy  or  autocorrelational  motion 
analysis  (standard  motion  analysis). 

An  early  example  of  texture-defined  motion  was  re¬ 
ported  by  Ramachandran  et  al.  (1973).  Detailed  studies 
and  analysis  were  recently  presented  by  Chubb  and 
Sperling  (1988, 1989a,  b,  1991),  Cavanagh  et  al.  (1989), 
Mather  (1991),  Turano  and  Pantle  (1989),  and  Victor 
and  Conte  (1989). 

We  construct  stimuli  for  which  energy  and  correspon¬ 
dence  mechanisms  yield  different  predictions  for  the 
strength  of  texture-defined  motion  (Werkhoven  et  al., 
1990b).  The  resulting  data  demonstrate  that  texture- 
defined  motion  is  computed  by  an  energy  mechanism, 
and  not  a  correspondence  mechanism.  And  we  will  show 
how  psychophysical  data  can  be  used  to  discriminate 
between  these  two  sorts  of  mechanisms  in  human  percep¬ 
tion  of  texture-defined  motion.  More  imprortantly,  these 
data  indicate  clearly  that,  for  the  class  of  textures  we  use 
(similarly  oriented  patches  of  random-phased  sinusoidal 
grating  with  different  spatial  frequencies  and  conti^ts), 
texture-defined  motion  perception  can  be  modeled  in 
terms  of  a  single  motion  energy  channel. 

Energy  channels 

Texture  grabbers.  Chubb  and  Sperling  (1989a,  1991) 
suggested  a  two-stage  mechanism  for  extracting  texture- 
defined  motion.  Under  their  model,  texture-defined 
motion  is  computed  by  motion  energy  channels  whose 
initial  transformations  are  called  texture  grabbers.  As 
discussed  below  (see  Rectification),  a  texture  grabber  is 
a  linear  spatial  filter  followed  by  rectification.  In  Stage 
2,  the  time-varying  output  (acti»^  y)  from  each  texture 
grabber  is  subjected  to  motion  energy  extraction. 

Rectification.  By  rectification  we  mean  any  function 
that  is  zero  for  an  input  of  zero,  and  is  monotonically 
increasing  for  both  positive  and  for  negative  real  inputs. 

Previously,  Chubb  and  Sperling  (1989b)  demonstrated 
stimuli  displaying  systematic  second-order  motion  that 


could  be  easily  explained  in  terms  of  a  texture  grabber 
that  used  fullwave  rectification  (e.g.  absolute  value, 
square,  etc.).  However,  the  motion  of  these  stimuli  was 
inaccessible  to  any  mechanism  whose  texture  grabber 
used  halfwave  rectification  (nonzero  output  only  for 
positive  or  only  for  negative  inputs).  These  results 
suggest  that  at  least  some  of  the  texture  grabbers 
used  in  second-order  motion  perception  use  fullwave 
rectification.  It  remains  to  be  seen  whether  there  are 
second-order  motion  mechanisms  that  use  halfwave 
rectification.  In  the  present  context,  however,  we  do  not 
distinguish  between  different  kinds  of  rectification.  The 
essential  nonlinear  characteristic  of  texture  extraction 
processes  has  also  been  recognized  by  Bergen  and  Adel- 
son  (1988)  and  Caelli  (1985). 

The  linear  filter  used  by  a  texture  grabber  is  presumed 
to  be  realized  in  the  visual  system  by  an  array  of  linear 
neurons,  all  with  the  same  receptive  field  profile,  dis¬ 
tributed  across  the  visual  field.  The  texture  grabber 
output  results  from  applying  some  fixed,  rectifying  non¬ 
linearity  (e.g.  the  absolute  value  or  the  square)  to  the 
output  of  each  of  these  linear  neurons.  It  is  assumed  that 
the  spatial  filter  of  Stage  1  operates  on  stimulus  contrast 
(see  Model),  rather  than  on  luminance,  but  this  assump¬ 
tion  is  not  critical  to  our  arguments.  The  output  of  a 
linear  filter  may  be  positive  or  negative  depending  on  the 
local  phase  of  the  sensed  texture.  Thus  the  expectation 
of  the  output  of  such  a  filter  is  zero  over  the  phase- 
randomized  texture  patches  from  which  our  stimuli  are 
constructed.  The  purpose  of  rectification  is  to  produce  a 
positive  average  output  across  the  texture  so  that  a 
texture  grabber  registers  the  presence  or  absence  of 
texture,  independent  of  local  phase.  Indeed,  that  is  why 
the  Stage- 1  transformation  (linear  spatiotemporal  filter 
followed  by  rectification)  is  called  a  texture  grabber. 

Activity.  The  output  of  a  texture  grabber  in  response 
to  a  particular  texture  is  called  activity. 

Motion  energy -channels.  Together,  a  texture  grabber 
followed  by  motion  energy  extraction  form  one  (texture- 
defined  motion)  energy  channel. 

Motion  correspondence-channels.  Together,  a  texture 
grabber  followed  by  motion  correspondence  extraction 
form  one  (texture-defined  motion)  correspondence 
channel. 

Previous  research  in  texture -defined  motion 

Historically,  motion  correspondence  has  been  investi¬ 
gated  with  ambiguous  motion  displays  in  which  motion 
is  perceived  as  occurring  along  one  or  the  other  of 
several  competing  paths.  Most  studies  have  dealt  with 
stimuli  that  stimulated  the  first-order  motion  system 
(e.g.  Burt  &  Sperling,  1981;  Kolers,  1972;  Navon,  1976; 
Papathomas,  Gorca  A  Julesz,  1991;  Shechter,  Hochstein 
&  Hillman,  1989;  Ullman,  1980;  Werkhoven,  Snippe  & 
Koenderink,  1990a;  Werkhoven  et  al.,  1990b)  and  these 
dau  are  adequately  explained  by  the  first-order  motion 
energy  extraction  models. 

We  consider  here  two  recent  studies  that  ^..^empt  to 
deal  with  motion  correspondence  in  texture-defined 
motion  stimuli.  These  studies  illitstrate  the  difficult 
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methodological  issues  that  arise  in  attempting  to  deter¬ 
mine  motion  correspondence,  and  thereby  they  indicate 
the  necessity  of  the  more  complex  paradigm  which 
we  use. 

Watson's  crossed-phi  procedure.  Watson  (1986)  at¬ 
tempted  to  measure  the  spatial  frequency  specificity  of 
the  perceptual  mechanism  responsible  for  texture- 
defined  motion.  He  used  a  “crossed  phi”  method,  in 
which  two  adjacent  texture  patches  (A  and  B)  in  frame 
1  exchanged  positions  in  frame  2.  The  patches  were 
Gaussian-windowed  sine  waves  (Gabor  patches).  Ob¬ 
servers  reliably  perceived  apparent  motion  between  the 
locations  when  A  and  B  were  different  spatial  frequen¬ 
cies.  No  apparent  motion  was  reported  when  the  patches 
were  of  similar  spatial  frequencies.  Watson  interpreted 
his  results  in  terms  of  a  model  in  which  motion  estimates 
are  computed  separately  within  different  spatial  fre¬ 
quency  bands.  He  used  the  increasing  probability  of 
apparent  motion  with  increasing  differences  in  spatial 
frequency  to  estimate  the  spatial  frequency  selectivity  of 
the  motion  channels.  Furthermore,  it  was  implicitly 
assumed  that  such  a  model  was  equivalent  to  a  corre¬ 
spondence  computation. 

In  our  view,  the  ambiguous  “crossed-phi”  paradigm 
admits  a  simple  alternative  interpretation  in  terms  of 
single  energy  channel  model.  Suppose  there  were  just  a 
single  energy  channel,  and  suppose  that  texture  A  hap¬ 
pened  to  produce  a  bigger  response  from  the  texture 
grabber  in  this  channel  than  texture  B.  Then,  the  change 
in  position  of  patch  A  would  produce  a  strong  motion 
response  in  this  channel;  the  change  of  position  of  patch 
B  would  produce  a  weak  motion  response  in  the  oppo¬ 
site  direction;  net  movement  would  be  perceived  in  the 
A  direction.  The  critical  observation  for  a  multichannel 
model  is  motion  transparency — that  motion  of  the  A 
and  B  patches  be  seen  simultaneously  in  opposite  direc¬ 
tions.  Only  then  can  we  be  sure  that  more  than  one 
channel  is  activated.  In  fact,  such  motion  traiisparency 
was  not  reported  by  Watson,  and,  in  our  experience,  it 
does  not  occur  in  such  stimuli.  Thus,  Watson’s  exper¬ 
iment  does  not  support  a  theory  of  multiple  correspon¬ 
dence  channels. 

Green's  Gabor  patches.  Green  (1986)  studied  texture- 
defined  motion  with  a  rotating  annular  display  similar  to 
Navon  (1976).  The  type  of  stimulus  used  by  Green  is 
schematized  in  Fig.  1.  Call  this  stimulus  /.  One  temporal 
period  of  I  consists  of  four  frames,  as  shown  in  Fig.  1. 

Each  of  these  frames  is  comprised  of  a  circle  of 
alternating  patches  of  two  types  of  texture,  texture  A  and 
texture  B.  From  frame  to  frame,  these  patches  of  texture 
take  rotary  steps  clockwise  around  the  circle.  This  rotary 
clockwise  motion  is  equivalent  to  lefc-to-right  motion  in 
an  analogous  horizontal  display,  as  indicated  by  the 
dotted  lines  connecting  annular  frames  to  horizontal 
frames. 

Let  T  be  an  arbitrary  texture  grabber,  and  suppose 
that  Da  is  average  response  of  T  to  texture  A  and  Di 
is  the  average  response  of  T  to  texture  B.  Then  the 
output  from  texture  grabber  T  in  response  to  stimulus  / 
is  a  spatiotemporal  function  whose  average  value  over 


FIGURE  I.  Green's  stimulus,  /.  One  temporsi  period  of  /  consists  o'* 
four  frames.  Each  of  these  frames  is  comprised  of  a  circle  of  alternating 
patches  of  two  types  of  texture,  texture  A  and  texture  B.  From  frame 
to  frame,  these  patches  of  texture  take  rotary  steps  clockwise  around 
the  circle.  This  rotary  clockwise  motion  is  equivalent  to  left-to-right 
motion  in  an  analogous  horizontal  display,  as  indicated  by  the  dotted 
lines  connecting  annular  frames  to  horizontal  frames. 

any  patch  containing  texture  A  is  and  whose  average 
value  over  any  patch  containing  texture  B  is  t-|. 
Although  there  will  certainly  be  variability  to  the  T-out- 
put  within  a  given  texture  patch,  this  intra-patch  vari¬ 
ability  is  not  critical  to  the  global  motion  percept  elicited 
by  /.  What  determines  this  global  motion  percept  are  the 
average  T-output  values,  and  Dg,  of  patches  of  the  two 
textures  A  and  B. 

As  many  authors  have  observed  (e.g.  Adelson  & 
Bergen,  1985;  van  Santen  &  Sperling,  1985),  motion 
detection  can  be  viewed  as  the  detection  of  orientation 
in  space-time.  As  is  clear  from  inspection  of  Fig.  1(a), 
any  motion  detection  mechanism  that  adheres  to  this 
general  principle  is  bound  to  register  clockwise  motion 
in  response  to  I  whenever  r*  #  r,. 

In  light  of  these  observations,  it  is  not  surprising  that 
observers  in  Green’s  experiment  tended  to  perceive 
clockwise  motion  in  displays  such  as  /.  In  a  critical  sense, 
the  clockwise  motion  of  /  is  intrinsic  to  the  format  of  the 
stimulus,  and  has  little  to  do  with  the  textures  A  and  B 
comprising  the  patches  of  I  (see  Werkhoven  et  at., 
1990b).  Nonetheless,  Green  took  his  results  as  support 
for  the  view  that  similar  textures  tend  to  match  with  each 
other  in  generating  motion-from-texture. 

Motion  metamers 

A  psychophysical  equivalence  relation  on  a  set  H  of 
physical  stimuli  is  called  a  metamerism.  Equivalent 
elements  A  and  B  of  G  are  called  metamers.  Typically, 
metamerisms  are  defined  using  discrimination  tasks.  For 
example,  if  A  and  B  are  two  illuminated  patches  that 
differ  in  spectral  composition,  we  say  they  are  meumers 
if  an  observer  cannot  distinguish  between  them. 
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In  this  paper,  we  focus  on  a  different  sort  of 
metamerism  that  we  call  motion  metamerism.  Let  O 
represent  a  set  of  texture  patches  that  vary  in  spatial 
frequency,  orientation,  and  contrast.  The  relation  that 
we  wish  to  capture  is  the  following:  for  any  two  textures 
A  and  B  in  Q.  we  call  A  and  B  motion  metamers  if  and 
only  if  any  occurrence  of  A  in  any  dynamic  visual 
display  can  be  replaced  by  a  patch  of  B  without  influenc¬ 
ing  the  global  motion  percept  elicited  by  that  display. 
That  is.  A  and  B  are  motion  metamers  if  and  only  if  A 
and  B  are  equivalent  inputs  to  the  mechanism  that 
computes  texture-defined  motion.  Obviously,  A  and  B 
need  not  be  equivalent  inputs  for  other  perceptual 
processes— as  we  shall  show,  motion  metamers  may 
appear  quite  different. 

It  is  impractical  to  interchange  A  and  B  in  all  possible 
motion  stimuli  to  verify  that  they  are  motion  metamers. 
Instead,  we  use  only  two  extreme  test  stimuli,  in  which 
any  failure  of  metamerism  would  be  most  likely 
to  appear.  The  essential  core  of  the  test  we  use  is 
defined  in  terms  of  the  stimuli  /|  and  A  diagrammed  in 
Fig.  2. 

Each  of  these  two  stimuli  pits  two  symmetrically 
opposite  motion  paths  against  each  other.  Stimulus  /, 
pits  a  path  comprised  of  a  patch  of  texture  A  and  a  patch 
of  texture  B  against  a  path  comprised  of  two  patches  of 
texture  A,  whereas  stimulus  A  pits  a  path  comprised  of 
a  patch  of  texture  B  and  a  patch  of  texture  A  against  a 
path  comprised  of  two  patches  of  texture  B.  We  presume 
that  each  of  these  paths  has  an  associated  motion 
strength,  and  that  the  global  motion  percept  (left  vs 
right)  elicited  by  one  of  these  stimuli  depends  only  on 
which  of  its  two  paths  has  greater  motion  strength.  In 
the  case  in  which  the  global  motion  percept  is  ambiguous 
we  assume  that  the  strengths  of  the  two  component 
paths  are  equal. 

For  any  textures  A  and  B  in  Q,  we  say  A  and  B  are 
transition  invariant*  if  and  only  if  the  leftward  vs 
rightward  motion  of  each  of  /,  and  /j  diagrammed  in 
Fig.  2  is  ambiguous  (i.e.  if  each  of  /,  and  /j  is  equally 
likely  to  elicit  a  global  rightward  or  leftward  motion 
percept). 

If  textures  A  and  B  are  transition  invariant,  then  the 
motion  strength  of  a  match  between  A  and  A  is  equal 
to  the  motion  strength  of  a  match  between  A  and  B,  and 
the  motion  strength  of  a  match  between  B  and  A  is  equal 
to  the  motion  strength  of  a  match  between  B  and  B. 

If  A  and  B  are  motion  metamers,  then  stimuli  /■  and 
A  are  ambiguous  in  motion  content;  hence,  A  and  B  are 
transition  invariant. 

On  the  other  hand,  for  practically  all  plausible  texture- 
defined  motion  computations,  if  A  and  B  are  transition 
invariant,  then  they  are  also  motion  metamers.  Indeed, 
the  data  we  present  make  it  clear  that  this  is  true  of  the 
computation  that  is  actually  used  to  compute  texture- 
defined  motion. 


*The  reason  for  this  tern  will  be  clear  in  Transition  Invariance  and 
Motion  Metamers. 
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FIGURE  2.  The  binary  relation  -  (transition  invariance),  (a)  A  sche¬ 
matic  diagram  of  stimulus  f, .  /,  contains  two  frames.  In  the  first  frame 
there  is  a  single  patch  of  texture  of  type  A.  In  the  second  frame,  there 
are  two  patches  of  texture,  one  of  type  B  and  another  of  type  A.  These 
patches  of  texture  are  offset  equal  distances  to  the  tight  and  left  of  the 
location  in  frame  I  of  the  single  patch  of  texture  A.  The  stimulus  /, 
sets  up  a  competition  between  one  motion  path  containing  a  patch  of 
texture  A  and  a  patch  of  texture  B  and  another,  opposite  motion  path 
containing  two  patches  of  texture  A.  (b)  A  schematic  diagram  of 
stimulus  /}.  For  any  textures  A  and  B.  we  set  A  -  B  just  if  the  stimuli 
/i  and  /}  diagrammed  in  (a)  and  (b)  respectively  are  both  ambiguous 
in  global  motion  content  That  is,  both  stimuli  /,  and  /;  are  equally 
likely  to  elicit  global  percepts  of  rightward  or  leftward  motion.  Any 
textures  A  and  B  for  which  A  -  B  are  said  to  be  transition  invariant. 
For  a  broad  range  of  motion  computations,  it  can  be  shown  that  for 
any  textures  A  and  B.  if  A  ~  B.  then  A  and  B  are  motion  metamers 
in  the  strong  sense  (A  and  B  can  be  freely  traded  for  each  other  in  any 
stimulus  without  changing  the  global  motion  percept  elicited  by  that 
stimulus). 


Motion  competition  schemes 

The  matching  technique  could  be  applied  to  a  variety 
of  ambiguous  motion  schemes  for  determining  the 
dimensionality  of  the  motion  computation.  However, 
not  all  of  them  have  the  power  to  discriminate  between 
different  types  of  motion  channels  (see  e.g.  the  discussion 
on  Green’s  display).  We  used  an  ambiguous  motion 
scheme  that  was  introduced  by  Werkhoven  et  al. 
(1990b).  In  this  motion  competition  scheme,  one  hetero¬ 
geneous  motion  path  (between  patches  of  texture  s  and 
texture  v)  competes  directly  with  one  homogeneous  path 
(between  patches  of  texture  s). 

By  varying  the  properties  of  the  textures  v,  we  can 
determine  the  heterogeneous  motion  paths  s,  v  that 
are  equal  in  strength  to  a  ceitain  homogeneous  path 
s,  s. 

Werkhoven  et  al.'s  competition  scheme  not  only  al¬ 
lows  to  determine  the  dimensionality  of  the  motion 
computation,  but  also  allows  to  determine  the  number 
and  type  (energy  vs  correspondence)  of  channels  in¬ 
volved  in  the  motion  compuution.  This  requires  a 
thorough  analysis  (given  in  the  Model  section). 

However,  an  intuitively  clear  property  of  this  scheme 
is  that  the  two  types  of  motion  channels  considered 
above  (energy  vs  correspondence-channels)  yield  quali¬ 
tatively  different  predictions  for  motion  metamery  and 
the  relative  strength  of  the  heterogeneous  and  homo¬ 
geneous  motion  paths.  Hence,  they  are  easily  discrimi¬ 
nated. 

A  preview 

Dimensionality  of  the  computation.  In  this  paper,  we 
discuss  a  general  motion  computation  consisting  of 
multiple  motion  channels,  where  each  channel  may  be 
either  an  energy  channel  or  a  correspondence  channel. 
By  studying  the  above  competition  scheme  with  many 
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different  pairs  of  texture  patches  (Expts  1  and  2),  we  can 
determine  classes  of  transition  invariant  textures 
(motion  metamers)  and  infer  the  dimensionality  of  the 
motion  computation  (Model  section).  The  results 
strongly  support  the  view  that  texture-defined  motion  is 
computed  by  a  single  energy  channel. 

METHOD 

In  this  section  we  describe  the  ambiguous  motion 
competition  scheme  used  in  the  experiments.  This 
scheme  (proposed  by  Werkhoven  et  al.,  1990b)  differs 
from  other  schemes  (e.g.  Burt  &  Sperling,  1981;  Green, 
1986;  Navon,  1976;  Shechter  et  al.,  1989;  Ullman, 
1980)  in  that  it  contains  a  single  heterogeneous  motion 
path  (between  patches  of  texture  1  and  texture  2)  that 
competes  directly  with  a  single  homogeneous  motion 
path  (between  identical  patches  of  texture  2).  Except 
for  textural  properties,  the  other  parameters  (such  as 
step  size  and  frame  rate)  of  the  motion  paths  are 
identical. 

Instead  of  varying  both  textures  I  and  2,  we  sampled 
a  subspace  of  possible  textures  resulting  in  two  (similar) 
schemes:  Scheme  I  and  Scheme  II.  In  Scheme  I,  we  kept 
texture  2  constant  (now  texture  s)  and  varied  texture  2 
(now  texture  v). 

Stimulus 

Motion  competition  Scheme  I.  In  Expt  1,  we  used 
motion  competition  Scheme  I.  The  motion  stimulus 
consisted  of  a  series  of  eight  frames 
shown  successively  in  time.  Figure  3  shows  a  sketch  of 
the  frames. 

The  first  frame  (/, )  contains  an  annulus  of  patches  of 
alternating  texture  types  s  and  v  at  regular  positions  (see 
Fig.  3,  at  the  left  side).  Because  the  viewing  distance  was 
constant  throughout  the  experiment,  we  will  specify 
dimensions  in  degrees  of  visual  angle.  The  annulus  of 
texture  patches  has  an  inner  radius  of  r,  »  1.04  deg,  and 
an  outer  radius  of  rj  -  2.08  deg.  The  mean  radius  r  is 
1.56  deg.  The  patches  (or  sectors)  are  spatially  contigu¬ 
ous.  Since  the  annulus  contains  eight  sectors,  each  sector 
has  a  width  of  45  deg. 

Frame  was  similar  to  frame  / ,  except  that  patches 
of  texture  v  are  replaced  by  a  uniform  patch  of  back¬ 
ground  luminance.  Furthermore,/]  was  rotated  around 
the  center  of  the  annulus  22.5  deg  with  respect  to  frame 
I  (see  Fig.  3,  left). 

In  a  sequence  of  frames,  the  locations  and  types  of 
patches  in  frame  were  identical  to  frame  /„,  except 
for  a  rotation  around  fixation  of  43  deg. 

The  presentation  time  of  a  single  frame  (“frame¬ 
time”)  was  133.3  msec.  Thus,  the  presentation  time  of 
the  eight-frame  sequence  was  l.OMsec.  The  annulus 
revolved  at  an  angular  speed  of  168.8  deg/sec,  yielding  a 
local  velocity  of  the  patch-centers  of  4.6  deg  of  visual 
angle  per  second. 

The  ambiguous  motion  stimulus  described  above  con¬ 
tains  two  motion  paths.  This  can  be  understood  most 
easily  using  a  diagram  in  which  we  show  the  angular 


FIGURE  3.  Motion  competition  Scheme  I.  Left:  a  series  of  frame' 
ify/t----)  is  shown  successively  in  time  (for  details  see  Metho.: 
section).  The  first  frame  (/,)  conuins  an  annulus  of  patches  of 
alternated  texture  type  s  and  v  at  regular  positions  drawn  against  a 
uniform  background.  The  annulus  has  an  inner  radius  of  r,  -  1.04  deg 
of  visual  angle,  and  an  outer  radius  of  r.  •  2.08  deg.  The  patches  of 
texture  s  and  texture  v  are  spatially  contiguous  and  alternate  within  the 
annulus.  Since  the  annulus  contains  eight  patches,  each  patch  has  a 
width  of  43  deg.  Angular  position  9  is  measured  clockwise  with  respect 
to  the  vertical.  The  second  frame  (/;)  is  similar  to  frame /, .  except  that 
the  low  frequent  patches  of  texture  v  are  now  replaced  by  a  uniform 
patch  of  background  luminance.  Furthermore./-  is  rotated  (clockwise) 
around  the  center  of  the  annulus  over  an  angle  of  22.S  deg  with  respect 
to  frame /■ .  In  a  sequence  of  frames,  frame/.,.]  is  identical  to  frame 
/„  except  for  a  rouition  around  the  center  over  an  angle  of  43  deg 
(clockwise).  Right:  angular  positions  9  is  along  the  horizontal  axis. 
Patches  of  texture  s  and  v  ate  shown  at  their  angular  positions  for 
frames /,  -  ■  ■/«  yielding  rows  of  patches.  The  top  row  of  patches  s  and 
v  corresponds  to  frame  /, .  The  second  row  of  patches  s  corresponds 
to  frame  /].  Hence,  time  (or  frame  number)  is  along  the  vertical  axis. 
When  framed  and  frame .4, 1  am  presented  in  succession,  two  motion 
paths  are  a  priori  likely.  A  homogeneous  motion  path:  clockwise 
matches  (CW)  between  patches  of  identical  texture  s  (indicated  by 
the  arrow  pointing  down  and  right).  A  heterogeneous  motion 
path:  counter-clockwise  (CCW)  matches  between  patches  of  texture  s 
and  patches  of  texture  v  (indicated  by  the  arrow  pointing  down  and 
left). 

positions  {q>)  of  the  patches  of  texture  for  successive 
frames.  Angular  position  is  measured  clockwise  relative 
to  the  vertical.  Such  a  diagram  is  shown  in  Fig.  3,  at  the 
right  side.  Note  that  the  horizontal  rows  of  patches 
correspond  to  frames  1,  2,  3  and  4  respectively.  By 
definition,  motion  extraction  is  based  on  the  dynamic 
properties  of  the  stimulus,  that  is  the  spatiotemporal 
pattern  of  textures.  In  the  diagram,  possible  motion 
paths  are  spatiotemporal  (oblique)  rows  of  elements. 
The  arrows  pointing  to  the  left  and  right  are  examples 
of  motion  paths  to  the  left  and  right  respectively.  In  the 
following  descrir  f  the  stimulus,  we  will  say  that  the 
neighboring  ele  n  a  motion  path  are  spatiotem- 
porally  linked  ^  matched”.  Note  that  the  term 
“matching”  is  used  for  the  purpose  of  stimulus  descrip¬ 
tion  only  and  that  it  does  not  refer  to  a  “motion 
correspondence”  computation. 
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When  frame  /,  and  frame  were  presented  in 
succession,  two  matches  between  patches  of  frame /,  and 
patches  of  frame  f,  ^ ,  were  a  priori  possible.  The  first 
match  is  a  homogeneous  clockwise  match  between 
patches  of  identical  texture  $  separated  by  ■¥■  22.S  deg 
(indicated  in  the  diagram  by  the  arrow  pointing  down 
and  to  the  right).  The  second  match  is  a  heterogeneous 
counter-clockwise  match  between  patches  of  texture  v 
and  patches  of  texture  $  (  —  22.5  deg,  indicated  by  the 
arrow  pointing  down  and  to  the  left).  Matches  between 
frames  /,  and  are  entirely  ambiguous.  Matches 
between  patches  of  frames  f,  and  /i*,  involve  large 
temporal  separations  (4(X)  msec)  relative  to  the  equival¬ 
ent  matches  between  frames  f,  and/,4.1  (133.3  msec).  It 
has  been  shown  that  motion  strength  decreases  strongly 
and  monotonically  with  temporal  interval  for  intervals 
larger  than  approx.  30  msec  (Burt  &  Sperling,  198); 
Werkhoven  &  Koenderink,  1991).  Therefore,  the 
matches  between  frames  f,  and  /,+i  are  unimportant  for 
motion  perception  in  these  stimuli. 

Scheme  I  displays  contain  homogeneous  and  hetero¬ 
geneous  motion  paths  in  opposite  directions.  By 
randomizing  the  direction  of  rotation,  the  directions  of 
the  two  motion  paths  (although  still  opposite)  are  ran¬ 
domized. 

The  annular  pinwheel  stimulus  was  used  for  various 
reasons.  First,  the  motion  stimulus  was  presented  at  a 
constant  eccentricity  in  the  parafovea,  and  the  effects  of 
anisotropy  of  the  retina  were  averaged  across  equivalent 
areas  of  the  visual  field.  Second,  it  was  easier  to  maintain 
fixation  so  eye  movements  were  better  controlled.* 
Finally  (with  the  use  of  circularly  symmetric  stimuli)  a 
motion  path  does  not  end  at  the  boundaries  of  the 
display,  avoiding  edge  effects. 

Motion  competition  Scheme  II.  Scheme  II  (used  in 
Expt  2)  is  equivalent  to  Scheme  I,  except  that  textures  s 
and  v  are  interchanged.  The  motion  stimulus  i  ult- 
ing  motion  paths  for  this  experiment  are  5  in 

Fig.  4. 

Although  the  heterogeneous  motion  path  (between 
patches  of  texture  s  and  v)  is  identical  to  that  of  Scheme 
I.  the  homogeneous  motion  path  is  different  from  that 
of  Scheme  I.  In  Scheme  II,  the  homogeneous  motion 
path  consists  of  patches  of  texture  v.  The  critical  import¬ 
ance  of  the  two  schemes  for  our  paradigm  concerns  the 
question  of  whether,  when  a  particular  s  and  v  are 
chosen  so  that  motion  paths  are  balanced  in  Scheme  I, 
the  paths  will  remain  balanced  when  the  same  s  and  v 
are  used  in  Scheme  II.  From  the  subjects’  point  of  view, 
however,  there  is  no  difference  between  the  two  schemes 
because,  for  any  stimulus  generated  by  Scheme  I,  an 
identical  stimulus  can  be  generated  by  Scheme  II. 


’Torsional  eye-movemenu  induced  by  the'  routing  annuli  (cyclo- 
induction)  were  not  controlled  in  our  experiment  Balliet  and 
Nakayama  (1978)  reported  the  ability  of  extremely  trained  subjecu 
to  make  stepwise  eye  tonions  up  to  routions  of  approx.  26  deg  for 
large  field  stimuli  (2S-50  deg  of  visual  angle).  However,  we  do  not 
expect  torsional  pursuit  in  our  experimental  conditions:  small  field 
stimuli,  brief  presentations,  fast  motion,  unptedicuble  motion 
direction,  and  ambiguous  or  near-threshold  motion  stimuli. 


FIGURE  4.  Motion  competition  Scheme  II.  This  scheme  is  similar  to 
Scheme  I  (see  Fig.  3),  except  that  textures  s  and  v  are  interchanged. 
In  Scheme  U,  the  homogeneous  motion  path  contains  textures  v. 


However,  during  the  course  of  a  session,  when  v  is  varied 
between  trials,  different  families  of  stimuli  are  generated 
by  the  two  schemes. 

Texture  stimuli 

The  textures  used  to  characterize  texture-defined 
motion  are  patches  of  sinusoidally  modulated  gratings 
that  differ  in  spatial  frequency  and  amplitude.  The 
grating  patches  were  arranged  in  eight  sectors  of  an 
annulus  (pinwheel)  around  the  fixation  point  with  the 
grating  extending  radially  in  each  sector.  Two  critical 
parameters  that  characterize  a  texture  patch  at  a  given 
location  of  the  pinwheel  are  amplitude  m  and  spatial 
frequency  <0.  Within  a  location,  grating  orienution  was 
always  radial.  The  phase  y  of  the  grating  was  a  random 
variable  with  a  uniform  distribution. 

We  use  polar  coordinates  to  further  characterize  the 
pinwheel.  Let  9  be  the  polar  angle  of  a  point  in  the 
itimge,  and  p  be  the  distance  to  the  origin  (the  center  of 
the  annulus).  Then  the  luminance  distribution  at  the 
point  p,  q>  in  sector  j  of  frame  i  is: 

V)  -  ^[1  +  "Iq  sia(2itr<p<o,j  +  y^)].  (1) 

We  define  the  mean  spatial  frequency  as  the  spatial 
frequency  at  mean  radius  r.  The  mean  spatial  frequency 
cjy  of  a  texture  patch  depends  only  on  whether  y  is  odd 
or  even.  That  is,  two  spatial  frequencies,  to,,  a>,  strictly 
alternate  between  adjacent  patches  on  every  frame  of  the 
display. 

Within  a  trial,  the  amplitude  m^  of  a  sector  ij 
depended  only  on  whether  /  and  J  were  even  or  odd.  On 
odd  frames,  m^j  was  chosen  as  m,  or  m,  according  to 
whether  the  sector  J  was  even  or  odd.  On  even  frames, 
sector  amplitude  m,j  alternated  between  0  and  m,  in 
Scheme  I  and  between  m,  and  0  in  Scheme  II.  Between 
trials,  m,  and  co,  were  changed.  Sixteen  values  of 
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amplitude  m,  from  0  to  1  were  used  increasing  by  steps 
of  0.0625:  0,  0.0625,  0.13, ....  1.  Spatial  frequency  to, 
was  varied  over  a  range  of  three  octaves: 

l . 2,  2.5,  3.7, 4.3, 4.9,  5.6,  7.4  or  9.9  c/deg.  The  amplitude 

m,  and  spatial  frequency  cu,  of  texture  s  were  constant 
throughout  the  experiment:  m,  =  0.5.  oj,  =  4.9  C/deg. 

The  phase  y,^,  0  <  y,j  <  2rt,  was  chosen  randomly  and 
independently  for  every  combination  of  t  and  y,  that  is, 
for  every  single  patch.  The  phase  randomization  of  every 
patch  makes  the  motion  of  the  stimulus  inaccessible  to 
any  first-order  (Fourier-based)  mechanism.  Phase  ran¬ 
domization  insures  that  motion  mechanisms  sensitive  to 
correspondences  in  stimulus  luminance  were  not  system¬ 
atically  engaged  (Chubb  &  Sperling,  1988). 

Figure  5  shows  an  example  of  a  series  of  frames  for 
Scheme  I.  Texture  s  is  a  “medium"  frequency  grating 
and  texture  v  is  a  “low"  frequency  grating.  The  regions 
inside  and  outside  the  annulus  (background)  were  uni¬ 
form  gray  and  had  a  luminance  value  {1^  =  72  cd/m’). 
Within  the  annulus’  texture  patches  the  expected  lumi¬ 
nance  value  was  equal  to  the  background  luminance. 

Apparatus 

The  experiment  was  controlled  by  a  IBM  386  PC 
compatible  computer,  driving  a  Truevision  AT-Vista 
video  graphics  adapter.  A  60  Hz  Imtec  1261L  monitor 
with  a  P4-type  phosphor  was  used  to  display  the  stimuli. 
The  screen  dimensions  were  21.8  x  14  cm  (640  x  480 
pixels;  12.3  x  8.0  deg  visual  angle).*  We  used  a  look-up 
table  to  linearize  the  monitor's  luminance  values  with  the 
gray  values  of  the  computed  stimulus  patterns.  The 
decay  time  to  10%  and  1%  intensity  was  about  1.3  and 
6.2  msec  respectively  which  is  shorter  than  the  temporal 
properties  of  retinal  processing  (Farrell,  Pavel  &  Sper¬ 
ling,  1990;  Sperling,  1976). 

Subjects 

Two  subjects  participated  in  the  experiments:  one  of 
the  authors  (PW)  and  a  colleague  (JS).  PW  is 
emmetropic.  JS  is  myopic  (-0.5  D)  but  was  in  focus  for 
the  viewing  distance  used.  Both  subjects  were  experi¬ 
enced  psychophysical  observers.  Natural  pupils,  binocu¬ 
lar  viewing,  and  spectacle  corrections  were  used 
throughout.  Several  naive  subjects  confirmed  the  main 
findings  for  the  experiments. 

Procedure 

Subjects  indicated  the  dominant  motion  path  (coun¬ 
ter-clockwise/clockwise)  by  pressing  one  of  two  buttons. 
In  both  experiments,  texture  s  (the  standard  texture)  had 
amplitude  m,  =  0.5  and  spatial  frequency  <a,  =  4.9  c/deg. 


‘Due  to  the  limited  bandwidth  of  the  video  amplifter  (30  MHz)  of  the 
monitor,  an  anisotropy  was  observed  for  the  average  luminance  of 
differently  oriented  textures  that  contain  high  spatial  frequencies. 
Therefore,  we  only  displayed  the  pixels  at  column  position  m  and 
row  position  n  for  which  (m  +n)  was  even.  The  other  pixels  were 
dark.  Hence,  vertical  and  horizontal  gratings  share  a  common 
"earner  '  component.  This  procedure  forfeits  maximum  luminance 
and  resolution  in  favor  of  eliminating  anisotropy;  the  net  resolution 
(320  X  240  pixels)  was  more  than  adequate  for  the  displays. 


From  trial-to-trial,  the  spatial  frequency  co.  and 
amplitude  m,  of  texture  v  was  varied.  The  experimenis 
determined  the  probability  J’,(m,;a»,)  of  perceptual 
dominance  of  the  heterogeneous  motion  path  as  a 
function  of  m,  for  certain  co,  using  the  method  cf 
constant  stimuli.  The  subscript  i,  t  -  1 ,  2.  indicates  Expt 
I  with  competition  Scheme  I  (Fig.  3)  or  Expt  2  with 
Scheme  II  (Fig.  4). 

The  probabilities  /’i(m,;cu,)  and  /’;(m,;cu,)  are  esti¬ 
mated  by  the  fraction  of  perceptually  dominant  hetero¬ 
geneous  motion  paths  out  of  36  presentations.  Spatial 
frequency  ta,  was  varied  over  a  range  of  three  octaves 
ca,  =  1.2,  2.5,  3.7, 4.3, 4.9,  5.6,  7.4  and  9.9  c/deg.  Within 
a  session,  amplitude  m,  was  varied  (pseudo-randomly 
from  trial-to-trial;  ta,  was  varied  only  between  sessions. 
For  each  spatial  frequency  ta, ,  Expts  1  and  2  were  both 
conducted  within  one  session. 

Subjects  viewed  the  stimuli  in  a  room  with  dimmed 
background  illumination. 

EXPERIMENT  1:  SCHEME  I 

Results 

By  definition,  the  homogeneous  path  (consisting  en¬ 
tirely  of  identical  patches  of  texture  s)  does  not  change 
in  this  experiment  when  texture  v  is  varied  (see  Scheme 
I,  Fig.  3).  The  strength  of  the  heterogeneous  path,  which 
is  composed  of  alternate  patches  of  textures  s  and  v  is 
varied  by  vailing  spatial  frequency  and  amplitude,  ca, 
and  m„  of  texture  v.  Figure  6  shows  the  probability 
/*,(m,;ta,)  of  reporting  the  heterogeneous  motion  path 
as  dominant  as  a  function  of  the  amplitude  m,  of  texture 
v.  Each  panel  shows  i*i(m,;  ca,)  for  a  different  value  of 
spatial  frequency  ca,. 

The  data  show  that  the  probability  of  reporting  the 
heterogeneous  path  as  dominant  increases  monotoni- 
cally  from  0  (for  small  m,)  to  1  (for  m,  =  1)  for  all  values 
of  ca,  except  the  highest,  where  the  probability  of 
heterogeneous  motion  dominance  has  only  reached 
about  65yo  when  m,  =  1 .  A  remarkable  feature  of  these 
dau  is  that  in  all  eight  panels,  the  probability  F,  (m,;  ca, ) 
of  heterogeneous  motion  dominance  exceeds  50%  for 
sufficiently  high  amplitude  of  patch  v. 

The  upper  left  panel  of  Fig.  6  shows  data  for  a  two 
octave  difference  between  the  spatial  frequency  of  tex¬ 
ture  s  (ca,  3  4.9  c/deg)  and  the  spatial  frequency  of 
texture  v  (ta,  =  1.2  c/deg).  Heterogeneous  motion  is  per¬ 
ceived  in  50%  of  the  presentations  when  the  amplitude 
nty  of  texture  v  is  approx.  0.2.  Note  that  at  this  balance 
point  where  both  paths  are  equally  likely,  both  the 
amplitudes  and  the  spatial  frequencies  of  textures  s  and 
V  are  markedly  different.  Once  m,  exceeds  0.5,  the 
heterogeneous  motion  path  is  dominant  in  100%  of  the 
presentations.  A  100%  perceptual  dominance  of  a  het¬ 
erogeneous  over  a  homogeneous  path  demonstrates  that 
the  similarity  between  the  textures  in  a  motion  path 
certainly  is  not  essential  for  motion  strength.  Indeed,  for 
sufficiently  large  m,,  the.heterogeneous  path  is  dominant 
over  the  homogeneous  path  for  every  combination  of 
frequencies  tested  in  Fig.  6. 
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FIGURE  S.  An  example  of  the  ambiguoiu  motion  display  (as  sketched  in  Fig.  3).  Frames and /,  (containing  the 
patches  of  textures)  are  shown  in  (a),  (b),  (c)  and  (d)  respectively.  For  this  example,  textures  s  and  v  differ  only  in  their  spatial 
frequency:  the  spatial  frequency  of  texture  s  is  two  octaves  higher  than  that  of  texture  v. 


The  transition  amplitudes  between  heterogeneous  and 
homogeneous  motion  occur  where  the  curves  of  Fig.  6 
cross  50%.  The  transition  amplitudes  occur  at  a  wide 
range  of  different  amplitudes  m,  for  different  spatial 
frequencies  tu,.  Each  F,  curve  is  well  characterized  by 
two  parameters:  the  transition  amplitude  )ii(co,)  and  the 
steepness  (T|(cj,)  at  the  transition  amplitude  (the  sub¬ 
script  I  indicates  Scheme  I).  The  transition  amplitude 
is  defined  as  the  amplitude  m,  of  texture  v, 
necessary  for  balancing  the  motion  paths  [such  that 
/*,(/»»,;  wj  =  50%],  the  steepness  <T|(co,)  is  defined  as  the 
derivative  w,)  with  respect  to  nu,  at  the 

transition  amplitude. 

To  estimate  transition  amplitude  /ri(w,)  and  steepness 
(T,((Oy),  we  selected*  data  points  of  each  probability 


•In  principle,  we  selected  the  three  data-points  around  the  transition 
amplitude  (the  crossing  of  the  curves  with  the  S0*/o  guide  line)  that 
were  closest  to  the  50%  guide  line.  There  were  only  two  exceptions. 
First,  at  spatial  frequency  to,  =  1.2  c/deg,  for  subject  PW,  Expt  2, 
we  selected  the  data  points  with  amplitude  m,  =  0.19, 0.25  and  0.31 
(to  avoid  the  low  amplitude  values,  for  which  Scheme  II  becomes 
ambiguous).  Second,  at  spatial  frequency  to,  =  2.5  c/deg,  for  sub¬ 
ject  JS.  Expts  I  and  2.  we  selected  the  data  points  with  amplitude 
m.  =  0.38  and  0.5  (since  we  had  no  data  points  close  to  the  guide 
line). 


curve  around  the  transition  amplitude.  Within  this  se¬ 
lected  range,  the  curve  was  assumed  to  be  linear,  and 
these  data  points  were  subject  to  a  least  square  method 
of  linear  regression  to  estimate  the  regression  coefficients 
nAoiy)  and  (t,(w,). 

Estimates  of  jt|(ci;,)  are  shown  in  Fig.  7  as  a  function 
of  the  varied  spatial  frequency  cu,  (open  circles).  The 
transition  amplitude  /r,  (cu,)  increases  systematically  with 
increasing  spatial  frequency  cu,  of  texture  v  for  both 
subjects.  Together,  the  data  of  Figs  4  and  5  indicate  that 
the  strength  of  the  heterogeneous  motion  path  increases 
with  increasing  amplitude  m,,  but  decreases  with  increas¬ 
ing  spatial  frequency  co,. 

Estimates  of  ff|(a),)  are  shown  in  Fig.  8  as  a  function 
of  the  varied  spatial  frequency  co,  (open  circles).  The 
steepness  (T|(a>,)  of  the  probability  curves  at  transition 
amplitude  decreases  with  the  spatial  frequency  co, 

of  texture  v.  In  the  Model  section  we  elaborate  on  this 
finding. 

Discussion 

Sufficiency  of  a  single  energy -channel.  In  a  single 
energy-channel,  we  assume  that  only  one  single  type  of 
texture  grabber  operates  on  the  input  yielding  an  activity 
representation  of  the  input.  Motion  strength  is  the  result 
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of  a  motion  energy  analysis  scheme  applied  to  this 
activity  represenution.  The  motion  stren^  of  a  path  is 
computed  from  the  product  of  activity  measures  between 
successive  patches  along  the  path  in  space-time.  Motion 
strength  of  a  heterogeneous  path  balances  homogeneous 
motion  strength  when  the  responses  (activities)  to  tex¬ 
tures  V  and  s  are  equal.  Differences  in  textural  properties 
between  elements  s  and  v  are  irrelevant  as  long  as  the 
activities  are  equal,  just  as,  in  scotopic  vision,  differences 
in  wavelength  are  irrelevant  as  long  as  the  rod  response 
is  the  same. 

The  results  for  Scheme  I  suggest  an  activity  transform¬ 
ation  that  is  a  monotonically  increasing  function  of 
amplitude  and  a  monotonically  decreasing  function  of 
spatial  frequency.  For  example,  to  balance  the  activity  of 
texture  s,  with  amplitude  m,  and  spatial  frequency  cu,, 
with  a  lower  spatial  frequency  texture  v,  re¬ 

quires  a  m,  <  m,.  This  pattern  of  results  suggests  a  single 
class  of  texture  grabbers  consisting  of  a  low-pass  spatial 
filter  followed  by  rectification. 

We  argued  that  a  single  energy-channel  is  sufficient  to 
explain  the  results  of  Expt  1 .  It  is  important  to  note  here, 
however,  that  our  finding  that  heterogeneous  motion  can 

(a) 


dominate  homogeneous  motion  is  also  consistent  with 
multiple  energy-channels,  as  will  be  shown  in  the  Model 
section.  For  example,  the  dominance  of  heterogeneous 
motion  may  well  be  the  result  of  two  independent 
energy-channels,  both  favoring  heterogeneous  motion. 
To  uniquely  determine  the  number  of  channels  involved, 
we  need  the  results  for  competition  Scheme  II  together 
with  a  formal  analysis  (Model  section). 

Secondary  contributions  of  a  correspondence -channel. 
In  the  Discussion  above,  we  argued  that  a  single-channel 
model  is  sufficient  to  model  the  (amplitude/ frequency 
dependent)  dominance  of  heterogeneous  motion  found 
for  Scheme  1.  However,  we  cannot  exclude  a  possible 
secondary  effect  of  texture  similarity  based  on  this 
scheme.  To  motivate  Expt  2,  we  need  to  elaborate  on  this 
argument. 

Although  motion  perception  may  be  dominated  by  a 
single  energy-channel,  there  may  yet  be  a  secondary 
contribution  of  a  correspondence-channel. 

The  relative  strength  of  the  heterogeneous  motion 
path  would  decrease  as  the  differences  between  the 
spatial  frequencies  and  amplitudes  of  successive  patches 
of  textures  s  and  v  increased.  Suppose  there  were  a 

PW 


DIMENSIONALITY  OF  TEXTURE-DEHNED  MOTION 


473 


(b) 


JS 


1  -2  cpd  2.5  cpd  3.7  cpd 


7.4  cpd 


AmpWuitomafv  Amplludamolv 


FIGURE  6.  Probability  P,(m,;  to,)  of  dominance  of  a  heterogeneous  motion  path  over  a  homogeneous  motion  path  is  shown 
as  a  function  of  the  amplitude  m,  of  texture  v  for  different  spatial  frequencies  to,  of  texture  v  for  two  subjects.  Open  circles 
represent  the  probability  P,  (m,;  cu.)  for  Scheme  I  (Fig.  3);  solid  dttles  cm,)  for  Scheme  II  (Fig.  4).  The  horizontal  dashed 
guide  line  indicates  a  SOVi  probability  of  heterogeneous  motion  dominance.  The  amplitude  m,  and  spatial  frequency  or,  of 
texture  s  is  the  same  for  all  panels:  m,'wO.S  and  to,  •  4.9  c/deg.  (a)  Subject  PW;  (b)  subject  JS. 


secondary  contribution  of  a  correspondence-channel  in 
Expt  1,  sensitive  to  differences  between  textures  in  either 
amplitude  or  frequency.  Because  the  correspondence- 
channel  favors  the  homogeneous  path  (by  definition), 
motion  balance  requires  v  in  the  heterogeneous  path  to 
have  a  higher  amplitude  to  overcome  the  similarity  in 
path  s,  s  than  if  there  were  no  correspondence-channel. 
Thus,  in  Scheme  I,  a  secondary  correspondence  effect 
would  displace  transition  amplitude  /i|((u,)  to  higher 
values. 

To  test  for  a  correspondence-channel,  we  introduce 
Scheme  II  in  which  s  and  v  are  interchanged  (see  Fig.  4). 
If  there  were  a  correspondence  effect,  in  Scheme  II  it 
would  favor  the  v,  v  path  and  the  transition  amplitude 
jii(a>,}  would  be  shift^  below  for  any  texture  v. 

When  the  homogeneous  and  heterogeneous  motion 
paths  remain  balanced  after  interchanging  textures  s  and 
V,  this  is  called  transition  invariance.  Transition  invari¬ 


ance  would  imply  that  there  is  no  contribution  of  a 
correspondence-channel. 

EXPERIMENT  2:  SCHEME  D 

Results 

Figure  6  shows  the  probabilities  of  the 

dominance  of  the  heterogeneous  motion  path  as  a 
function  of  the  amplitude  m,  of  texture  v  for  different 
spatial  frequencies  cu,  of  texture  v.  The  data  points  for 
Scheme  II  are  marked  by  a  solid  circle. 

When  m,>0,  the  display  is  physically  as  well  as 
perceptually  ambiguous.  A  value  of  30%  is  shown  for 
m,  0,  though  no  data  were  collected  at  this  point.  By 
varying  the  amplitude  of  texture  v  in  this  experiment,  the 
strength  of  both  the  heterogeneous  motion  path  and  the 
homogeneous  motion  path  are  varied.  As  the  amplitude 
nty  increases,  the  probability  of  heterogeneous  motion 
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FIGURE  7.  Transitioo  amplitudes  n,((o,)  as  a  function  of  spatial  frequency  <u,.  Open  circles  for  Scheme  I.  solid  circles  for 
Scheme  11.  The  vertical  dashed  line  indicates  the  spatial  frequency  of  texture  s;  u,  -  4.9  c/deg.  The  horizonul  dashed  guide 

line  indicates  the  amplitude  of  texture  s: 


dominance  first  increases  to  a  maximum,  then  decreases 
to  zero  for  high  amplitude  m,.  On  the  whole,  for 
amplitudes  above  0. 1  or,  in  a  few  cases,  0.2,  the  Scheme  I 
and  Scheme  II  curves  are  mirror  complementary,  and 
seem  to  cross  at  exactly  P  =  50 Vo.  TTiat  is,  the  two 
schemes  produce  remarkably  similar  transition  ampli¬ 
tudes. 

To  examine  the  correspondence  between  the  data 
from  Schemes  I  and  II,  some  definitions  are  needed.  Let 
the  transition  amplitude  Hiioi,)  be  the  amplitude  m,  of 
texture  v  for  which  the  motion  paths  are  balanced,  and 
the  probability  of  heterogeneous  motion  dominance 
/’;(mv;co.)  is  50%.  The  steepness  at  this  transition 
amplitude  is  02(“v)-  The  transition  amplitude 
steepness  value  (r}(cu,)  are  estimated  as  and 

(r,(cu,)  in  the  previous  section. 

To  compare  the  transition  amplitude 
Scheme  II  with  transition  amplitude  fty  (oj,)  for  Scheme 
I,  they  are  presented  together  as  a  function  of  spatial 
frequency  <u,  in  Fig.  7.  Transitions  are  presented 
with  solid  circles.  As  in  Scheme  I,  the  amplitude 
of  texture  v,  necessary  for  balancing  the  motion  paths, 
increases  systematically  with  increasing  spatial  frequency 
cu,  of  texture  v.  An  exception  for  both  subjects  are  the 
transition  amplitudes  for  cu,  =  9.9  c/deg. 


To  compare  the  steepness  values  <r;(cj,)  for  Scheme  II 
with  steepness  values  a,  (ai,)  (for  Scheme  I),  the  absolute 
value  of  is  shown  as  a  function  of  the  varied 

spatial  frequency  cu,  in  Fig.  8  (using  solid  circles).  It 
should  be  noted  that  the  estimation  is  not  very  accurate: 
the  standard  deviation  in  the  distribution  of  steepness 
coefficient  a,(o},)  is  approx.  20%.  However,  like  i. 
the  steepness  <7,(0),)  shows  a  tendency  to  decrease  with 
increasing  spatial  frequency  cu,  of  texture  v. 

Discttssion 

Transition  invariance  and  motion  metamers.  It  is  im¬ 
mediately  clear  that,  for  most  spatial  frequencies  oi,  of 
texture  v,  the  transition  amplitude  is  equal  within 
measurement  error  to  transition  amplitude  ftz((Os)  (see 
Fig.  7).  In  fourteen  of  sixteen  cases,  the  transition 
amplitudes  are  invariant  when  the  textures  s  and  v  are 
interchanged.  This  we  call  transition  invariance. 

In  two  cases  (the  highest  spatial  frequency  used — 
CD,  =  9.9  c/deg — for  both  subjects),  a  small  difference 
between  transition  amplitudes  for  Schemes  I  and  II 
is  observed.  At  the  high  spatial  frequency  of  v,  the 
amplitude  of  texture  v  necessary  to  balance  the  motion 
paths  is  slightly  smaller  for  Scheme  II  than  for  Scheme 
I.  This  shift  in  transition  amplitude  suggests  a  small 
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FIGURE  8.  Steepness  vtiues  e,(<u,)  as  a  function  of  spatial  frequency  cu,.  Open  circles  for  Scheme  I.  solid  circles  for  Scheme 
II.  (Note  that  to  facilitate  comparison  absolute  values  are  gives'.)  The  vertical  dashed  guide  line  indicates  the  spatial  frequency 

of  texture  s:  (0, »  4.9  c/deg. 
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similarity  effect  (a  small  contribution  of  a  correspon¬ 
dence-channel),  and  was  discussed  in  the  Discussion  of 
Expt  1. 

Transition  invariance  implies  that  textures  s  and  v 
(at  transitions)  are  equivalent  with  respect  to  motion 
processing  and  can  be  interchanged  in  any  motion 
path  (Scheme  I  and  Scheme  II)  without  affecting  motion 
strength.  This  leads  to  the  important  conclusion  that 
textures  s  and  v  are  (texture-defined)  motion  metamers. 

It  is  interesting  to  note  that  Green  (1986,  Fig.  7.  p. 
604)  was  unable  to  find  an  amplitude  that  could  make 
a  spatial  frequency  patch  of  S.Oc/deg  into  a  motion 
metamer  of  a  1.7  c/deg  patch.  We  had  no  difficulty  in 
finding  metamers  between  even  more  disparate  spatial 
frequencies.  However,  our  data  in  Fig.  5  show  that  one 
of  the  two  subjects  would  require  the  S  c/deg  stimulus  to 
have  more  than  two  times  the  amplitude  of  the  1.7  c/deg 
stimulus,  and  this  is  outside  the  range  of  amplitudes  that 
Green  explored. 

Necessity  of  a  single  energy -channel.  The  general 
finding  of  transition  invariance  strongly  constrains  the 
possible  ways  in  which  motion  can  be  computed  between 
textures  in  the  class  we  are  considering. 

Transition  invariance  shows  that  there  is  no  secondary 
contribution  of  correspondence-channels  (see  the 
discussion  on  this  issue  in  Expt  1).  The  effect  that  a 
patch  of  texture  v  has  on  the  strength  of  motion  is 
independent  of  the  other  patches  in  the  path.  At  a 
transition,  the  strength  of  motion  path  s,  v  is  equal  to 
that  of  V,  V  and  that  of  s,  s,  although  a  correspondence- 
channel  would  yield  stronger  motion  for  the  homo¬ 
geneous  paths. 

The  only  alternative  is  a  system  of  multiple  energy- 
channels  that  must  be  combined  and  represented  by  a 
single  scalar  representation  (e.g.  summation  of  energy- 
channels).  In  the  Model  section,  we  prove  (under  the 
assumption  of  channel  summation)  that  if  multiple 
energy-channels  were  involved,  the  transition  amplitude 
would  generally  shift  when  the  textures  s  and  v  are 
interchanged  in  Schemes  I  and  II.  However,  when 
motion  perception  is  exclusively  ruled  by  a  single  energy- 
channel  (the  product  of  the  activity  of  a  single  type 
of  texture  grabber),  the  transition  amplitude  is  in¬ 
variant  when  the  textures  s  and  v  are  interchanged. 
Hence,  transition  Invariance  uniquely  supports  a 
single  energy-channel  model  of  texture-defined  motion 
perception. 

EXPERIMENT  3:  AMPLITUDE  LINEARITY 
Motivation 

In  the  above  experiments,  we  have  shown  that  the 
transition  amplitude  Hiico,)  increases  systematically 
with  increasing  spatial  frequency  cOy  of  texture  v  for 
both  subjects.  The  strength  of  the  heterogeneous  motion 
path  in  &heme  I  increases  monotonically  with  increas¬ 
ing  amplitude  m,  but  decreases  with  increasing  spatial 
frequency  eu,.  In  order  to  further  specify  the  dependency 
of  motion  strength  on  amplitude,  we  performed  an 
experiment  similar  to  that  described  above  using  com¬ 


petition  Scheme  I,  and  varied  the  amplitude  of 
texture  s. 

Results 

We  kept  the  frequency  of  textures  s  and  v  constant 
(o),  =  4.8  c/deg  and  cu,  =  1.2c/deg)  and  measured  the 
transition  amplitude  iii  as  a  function  of  amplitude  m, 
(Scheme  I).  Transition  amplitude  was  estimated  from  the 
psychometric  curves  using  the  method  described  earlier. 
Figure  9  shows  the  transition  amplitude  n  of  texture 

V  for  three  amplitude  values  of  texture  s  (m,  =  0.50, 0.75 
and  1 .00)  for  three  subjects.  The  data  strongly  suggest  a 
linear  dependence  of  the  transition  amplitude  of  texture 

V  on  the  amplitude  of  texture  s.  The  solid  lines  are  the 
best  fits  (minimizing  the  sum  of  squares),  accounting  for 
at  least  97%  of  the  variance  for  each  subject. 

Discussion 

We  showed  that  the  transition  amplitude  of  texture  v 
needed  to  balance  the  motion  path  s,  v  with  the  motion 
path  s,  s  varied  linearly  with  the  amplitude  of  texture  s. 
This  dependency  is  easily  accommodated  in  a  model 
where  the  texture  grabber  is  linear  in  the  amplitude  of 
the  texture.  In  fact,  one  can  easily  show  that  amplitude 
linearity  follows  directly  from  the  linear  data  under  the 
assumption  that  the  texture  grabber  is  a  separable 
function  of  spatial  frequency  and  amplitude.  A  linear 
(low-pass)  spatial  frequency  filter  is  a  simple  example  of 
such  a  separable  filter  characteristic. 

MODEL 

Summary  of  model  constraints 
We  used  the  analogy  with  colorimetry  and  some 
general  assumptions  about  the  possible  motion  compu¬ 
tations  involved  to  reach  the  conclusion  that  texture- 
defined  motion  strength  is  ruled  by  a  single 
energy-channel.  We  siunmarize  our  reasoning. 


FIGURE  9.  The  dependence  of  transition  amplitude  /t,  (o),}  on 
amplitude  m,  of  texture  s.  The  spatial  frequency  ti>,  was  4.9  c/deg,  and 
(o,  was  1.2  c/deg.  Competition  Scheme  I  was  used.  Grcles,  subject  JS; 
squares,  subject  PW.  The  solid  lines  show  the  best  linear  fit  (minimizing 
the  sum  of  the  squared  deviations). 
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We  discriminate  two  classes  of  motion  computations: 
energy-channels  and  correspondence<hannels,  yielding 
different  metrics  for  the  strength  of  a  motion  path. 
Consider,  a  heterogeneous  motion  path  composed  of 
patches  of  texture  s  and  v.  The  strength  of  an  energy- 
channel  for  an  s,  v  path  is  determined  by  the  product  of 
the  activity  of  texture  s  and  that  of  v.  The  activity  of  a 
texture  is  the  output  of  some  nonlinear  transformation 
(texture  grabber)  that  maps  texture  into  a  scalar.  Energy- 
channels  are  insensitive  to  differences  in  textural  proper¬ 
ties  and  allow  heterogeneous  motion  paths  s,  v  to 
dominate  over  homogeneous  paths.  By  definition,  the 
strength  of  a  correspondence-channel  is  determined  by 
the  similarity  of  the  textural  properties  of  textures  s  and 
V.  That  is,  homogeneous  paths  s,  s  and  v,  v  dominate 
heterogeneous  paths  s,  v. 

In  theory,  multiple  channels  of  each  type  may  be 
involved  in  a  motion  computation  yielding  a  motion 
strength  vector  representation  of  arbitrary  dimensional¬ 
ity.  However,  the  experimental  results  impose  the 
following  constraints.  First,  the  class  of  motion  paths 
equal  in  strength  for  both  Scheme  I  and  Scheme  II 
indicates  that  the  c  mputation  is  one-dimensional.  Sec¬ 
ond,  the  invariance  of  transit  <  ns  for  Scheme  I  and 
Scheme  II  exclude  correspondence-channels.  This  leaves 
us  with  a  system  of  multiple  energy<hannels,  that 
combine  into  a  single  scalar  representation  of  motion 
strength. 

Although  we  have  shown  that  a  single  energy-channel 
is  sufficient  to  model  the  data,  we  promised  a  proof  for 
the  necessity  of  a  single  energy-channel.  This  proof  is 
based  on  the  inconsistency  of  multiple  energy-channels 
with  transition  invariance.  We  assume  a  system  of 
multiple  energy-channels  that  linearly  combine  to  rep¬ 
resent  motion  strength  (summation  of  energy-channels). 
Such  a  system  would  result  in  different  transitions  for 
Scheme  I  and  II.  The  proof  is  given  and  discussed  in  the 
Appendix. 


The  energy -channel 

In  this  section,  we  derive  the  characteristics  of  the 
single  energy<hannel.  This  energy-channel  consists  of 
two  stages.  The  first  stage  is  the  nonlinear  transform¬ 
ation  (texture  grabber).  The  simplest  version  of  a  texture 
grabber  is  a  spatiotemporal  linear  filter  followed  by 
rectification  (see  Chubb  &  Sperling,  1989a,  b).  The  out¬ 
put  of  this  first  stage  (the  texture  activity)  is  fed  into  the 
second  stage;  motion  energy  analysis.  Stages  one  and 
two  are  sketched  in  Fig.  10. 

Stage  1:  texture  grabbers.  It  is  now  well-established 
(see  review  by  Shapley  &  Enroth-Cugell,  1984),  that 
early  retinal  gain-control  mechanisms  pass  not  stimulus 
luminance,  but  rather  a  signal  approximating  stimulus 
contrast,  the  normalized  deviation  of  stimulus  lumin¬ 
ance  from  its  local  average.  We  assume  that  the 
spatiotemporal  filters  of  Stage  1  operate  on  stimulus 
contrast. 

The  output  magnitude  of  these  filters  varies  over  the 
visual  field,  depending  on  what  textures  happen  to 
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FIGURE  10.  Diagram  of  a  single  channel  motion  computation.  First 
stimulus  amplitude  is  extracted  followed  by  a  linear  spatial  filter  F  and 
rectification.  The  spatial  filter  together  with  the  rectification  is  called 
“texture  grabber"  (the  first  suge).  The  output  of  the  texture  grabber 
is  called  activity.  The  second  stage  (motion  energy  analysis)  is  basically 
a  coincidence  detector  it  computes  the  product  of  the  delayed  activity 
at  location  I  with  the  current  activity  at  location  Response 
variability  across  trials  is  due  to  internal  noise  which  is  modeled  by  an 
additive  noise  having  a  standard  model  density  function  with  mean  0 
and  standard  deviation  I.  The  heterogeneous  path  is  dominant 
whenever  the  net  motion  strength  in  the  direction  of  the  heterogeneous 
motion  path  (after  adding  noise)  is  positive  (decision  stage). 
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populate  these  regions.  The  output  of  a  linear  filter  to  a 
texture  is  variable  and  depends  on  the  local  phase  of  the 
texture.  The  purpose  of  rectification  is  to  transform 
regions  of  highly  variable  response  into  regions  of  high 
average  value,  thus  insuring  that  the  rectified  output 
registers  the  presence  or  absence  of  texture,  independent 
of  phase.  Examples  of  rectification  are  half-wave  rectifi¬ 
cation  (setting  negative  values  to  zero)  and  full-wave 
rectification  (anything  that  is  symihetric  with  respect  to 
input  sign,  such  as  absolute  value  or  squaring). 

The  output  of  Stage  1  is  called  activity.  The  resulting 
transformation  (accomplished  by  Stage  1)  yields  a 
spatiotemporal  function  whose  value  reflects  the  local 
texture  preferences  of  the  Stage  1  filter  in  the  visual  field 
as  a  function  of  time  (see  also  Bergen  &  Adelson,  1988; 
Caelli,  1985).  The  activity  transformation  of  the  texture 
grabber  depends  on  the  amplitude  m  and  spatial  fre¬ 
quency  tu  of  the  textures  involved. 

In  Expt  3,  we  have  shown  that  texture  activity  is  linear 
in  texture  amplitude.  This  is  accommodated  by  a  spatial 
filter  that  is  linear  in  stimulus  contrast.  We  can  further 
characterize  the  spatial  filter  characteristics  by  the  ampli¬ 
tude  of  its  Fourier  transform:  F(cu).  We  assume  that 
rectification  is  an  absolute  value  operation.  Thus,  after 
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rectificati  .n,  the  activity  transfonnation  T  is  pro- 
ortional  to  m  and  to  F(cj): 

T(m,a})  =  mF{co).  (2) 

This  texture  activity  T  is  fed  into  the  second  (motion 
energy  analysis)  stage. 

Stage  2:  motion  energy  analysis.  The  second  stage 
(motion  energy  analysis)  is  a  coincidence  detector;  it 
computes  the  product  of  the  delayed  activity  at  Location 
1  with  the  current  activity  at  Location  2  (van  Santen  & 
Sperling,  1984).  For  the  displays  we  use  in  our  exper¬ 
iments,  the  output  of  the  second  stage  corresponds  to 
motion  strength. 

To  simplify  the  computation  in  the  model,  we  assume 
that  the  first-stage  spatiotemporal  filter  is  space-time 
separable.  Indeed,  space-time  separability  seems  to  be 
the  rule  in  apparent  mo’  -on  (Burt  &  Sperling,  1981;  van 
de  Grind,  Koenderink  &  van  Doom,  1986).*  Given 
space-time  separability,  we  can  ignore  the  temporal 
component  of  filtering  because  temporal  patterns  were 
not  varied  in  our  stimuli. 

We  proceed  as  follows.  The  perceived  direction  of 
motion  is  considered  to  be  the  outcome  of  a  competition 
in  motion  strength  between  motion  paths.  Within  a 
path  the  strength  of  motion  between  a  patch  of  texture 
V  and  a  patch  of  texture  s  is  determined  by  the  product 
of  the  activities  of  the  first  stage.  We  assume  that  the 
strengths  of  detectors  for  all  paths  are  additive  in  the 
final  motion  percept,  and  adopt  a  linear  combination 
model  (Doshcr,  Sperling  &  Wurst,  1986).  Additive  in¬ 
ternal  noise  determines  the  shape  of  the  psychometric 
functions  for  motion  direction  as  a  function  of  ampli¬ 
tude. 

Consider  the  strength  model  with  respect  to  compe¬ 
tition  Scheme  I  (Fig.  3).  In  one  direction  there  is  a 
homogeneous  motion  path  containing  patches  of  identi¬ 
cal  texture  s.  In  the  opposite  direction,  there  is  a 
heterogeneous  motion  path  containing  patches  of  differ¬ 
ent  textures  s  and  v.  For  sine  wave  stimuli,  a  half-Re- 
ichardt  model  (simple  product)  is  equivalent  to  the  whole 
Reichardt  model  (difference  of  products)  (van  Santen  & 
Sperling,  1985),  so  we  need  to  consider  just  a  simple 
product  rule. 

The  strength  of  the  heterogeneous  motion  path  is: 

oiy,  (o,)  =  m, /■((«, )m,f'(co,).  (3) 

The  motion  strength  S|.m,  for  the  homogeneous  motion 
path  is  equal  to: 


Linear  combination  of  both  components  with  equal 
weights  yields  a  net  motion  strength  £>,  in  the  direction 
of  the  heterogeneous  path: 

Di  (m,,  0), ,  m,,  cu,)  =  5,  h.(w,,  to, ,  m,.  w,) 

+  5,.ho("«..tu,).  (5) 

Response  variability  across  trials  is  due  to  additive 
internal  noise  which  is  assumed  to  be  distributed  as  a 
standard  normal  density  function  with  mean  0  and 
standard  deviation  /.  (Fig.  10).  A  linear  addition  of  noise 
yields  the  internal  decision  variable  /  which  has  a  normal 
distribution  ,V  with  mean  D  and  standard  deviation 
According  to  signal  detection  theory  (Green  &  Swets, 
1966)  the  probability  P  of  heterogeneous  motion  domi¬ 
nance  is: 


P,(m,;cu,)  =  P(/  >0) 


•  » 

,V  { Z)  I  (m, ,  to, ,  w, ,  cu, ),  /. }  di. 

0 


(6) 


Substituting  motion  strengths  [expressions  (3)  and  (4)} 
into  the  additive  linear  combination  [expression  (5)]  and 
then  substituting  [expression  (5)]  into  the  noise-driven 
decision  process  [expression  (6)]  yields: 

?i(m.;a)J=  ‘  J  N[[m,F(co,)m,Fia),) 

Jo 

-m;F=(cu,)],/}  di.  (7) 

for  the  probability  of  heterogeneous  motion  dominance 
for  Scheme  I  (Fig.  3). 

Similar  reasoning  yields  the  net  motion  strength  £); 
and  the  probability  P,(my  ;co^)  of  heterogeneous  motion 
dominance  in  Scheme  II  (see  Fig.  4): 

Diim,,  07,,  m,,  CO,)  =  co^,m„co,) 

+  S-ho(m,,o>,)  (8) 

=  m,F[co,)m,F{(o,)-m:F-ico,)  (9) 

and 

/»2(m,;07,)  =  -J=  I  N{[myF{co,)m,F{(o,) 
yj'2n/.'  Jo 

-mJ/'V,)], ''-}<!»•  (10) 

This  model  predicts  the  transition  and  steepness  at 
transitions  of  the  probability  curves  for  both  the  exper¬ 
iments. 


St.hoi'n„(o,)=‘ -mlF\(o,)  (4) 

(strength  in  the  opposite  direction  has  opposite  sign). 


*Ii  is  reasonable  to  consider  that  the  linear  filter  in  the  texture  grabber 
may  itself  be  composed  as  a  weighted  sum  of  many  filters,  i.e.  filwrs 
that  also  are  in  the  processing  path  for  first-order  motion.  A  linear 
filter  composed  as  the  sum  of  component  filters  would  be 
space-tune  separable  if  each  of  its  component  filters  were 
space-time  separable  and  had  the  same  temporal  function,  inde¬ 
pendent  of  spatial  scale.  This  seems  to  be  the  case  in  motion 
processing  (Burt  A  Sperling,  1981;  van  de  Grind  el  a!.,  1986). 


Predictions  for  Scheme  / 

For  diflerent  spatial  frequencies  a>,  of  texture  v,  we 
measured  the  probability  /*,(m,;a7,)  of  heterogeneous 
motion  dominance  as  a  function  of  the  amplitude  m,  of 
texture  v.  Our  model  predicts  that  the  pro^bility  P,  of 
heterogeneous  motion  dominance  is  an  error  function 
of  the  net  motion  strength  D,  [see  equation  (6)].  In 
this  experiment,  the  net  motion  strength  D,  is  linear  in 
m,.  Hence,  we  expect  an  error  function  for  the  prob¬ 
ability  function  ?,(m,;(u,)  as  a  function  of  m,  [see 
equation  (7)]. 
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Transition  amplitude.  The  transition  amplitude  pt  (cUv ) 
is  defined  as  the  amplitude  m.  of  texture  v  at  which  the 
probability  of  heterogeneous  motion  dominance 
is  50%  for  a  given  spatial  frequency  cu,  of 
texture  v.  Hence,  for  m.  =  (cu. ),  the  strength  of  the 
heterogeneous  and  homogeneous  motion  paths  are 
balanced  and  we  have  5,  ^.  =  -S,  h,  or  [see  expressions 
(3)  and  (4)]: 

Pt((jjJF{coJ  =  m,F{co,)  =  K,  (11) 

where  k  is  a  constant  equal  to  the  activity  of  standard 
texture  5.  If  FicoJ  is  a  low-pass  filter,  will  be  a 

monotonically  increasing  function  of  cu.  (as  supported 
by  our  experiments): 

=  kF-'(oj,).  (12) 

Steepness.  The  steepness  cr|(co,)  is  defined  as  the 
derivative  of  with  respect  to  m.  at  transition 

amplitude  /i|(cuj: 

c 

<7i  (oj. )  =  —  Px  (w. ;  )L^ .  , 

COT,  ' 

=  ^==Kf(cu.).  (13) 

Thus,  the  steepness  ff|(ttj. )  is  expected  to  decrease  as  a 
function  of  the  spatial  frequency  oj,  for  low-pass  filters 
(as  supported  by  our  experiments). 

In  conclusion  we  expect  error  functions  for  the  prob¬ 
ability  /’i(ot,;cu,)  of  heterogeneous  motion  domin,  ice 
as  a  function  of  amplitude  m,  with  (a)  a  transition 
amplitude  /<i(cu,)  that  is  inversely  proportional  with 
/■(cu,)  and  (b)  a  steepness  <T|(co,)  that  is  proportional 
with  F(cu,).  If  we  have  low-pass  filters,  /’(co,)  decreases 
monotonically  with  spatial  frequency  cu,. 

Predictions  for  Scheme  II 

For  different  spatial  frequencies  cj,  of  texture  v,  we 
measured  the  probability  F,(ot,;cu,)  of  heterogeneous 
motion  dominance  as  a  function  of  the  amplitude  ot,  of 
texture  v.  /’,(ot,;cjJ  is  an  error  function  of  Du  (see 


equation  (10)].  However,  for  Scheme  II  (unlike  for 
Scheme  I)  Du  is  not  linear  with  the  varied  amplitude  ot. 
of  texture  v.  As  we  increase  the  amplitude  ot.  of  texture 
V,  Dll  shows  a  quadratic  dependence  on  ot.  .  Therefore, 
we  do  not  expect  an  error  function  for  F.(ot.  ;cu. ). 

If  amplitude  ot.  of  texture  v  is  zero,  the  probability  of 
heterogeneous  motion  dominance  Pj  will  be  50%  (the 
motion  stimulus  is  purely  ambiguous!).  Starting  at 
OT,  =  0,  it  first  increases  linearly  with  ot.  ,  is  maximal  for 
OT,  =  OT,F(a;,)/[2F(a).)],  and  decreases  again  with  further 
increases  of  ot,..  Obviously,  there  may  exist  an  amplitude 
OT,  =  (between  the  “optimal’'  amplitude,  that  yields  a 
maximal  D,,  and  a  very  high  amplitude,  that  yields  a 
negative  D-)  for  which  Pj  =  50%. 

Analogous  to  the  derivation  in  the  previous  section, 
one  can  find  the  analytic  expressions  for  the  transition 
^-(g),)  and  steepness  ff,((u.  )  of  the  probability  curves  for 
Scheme  II.  The  expressions  for  the  transition  amplitudes 
are  equal:  /i.(aj,)  =  ;ii(co.).  The  expressions  for  the 
steepness  of  the  transitions  for  Scheme  I  and  II  differ 
only  in  sign:  (T,(£u.)=  -cr,(ct),). 

The  texture  grabber 

We  can  simply  find  the  Fourier  transform  F(<d)  of  the 
low-pass  filter  from  the  reciprocal  transition  p~ '  (tu, )  [sec 
expression  (12)|  and  from  the  steepness  <t,(cu,)  as  a 
function  of  spatial  frequency  w,  [see  expression  (13)]. 

The  reciprocal  transition  amplitudes  are  expected  to 
be  proportional  to  the  function  F(cu,  ).  Estimates  of  the 
reciprocal  transition  amplitudes  p~'((o,)  are  shown  in 
Fig.  11. 

From  the  reciprocal  transitions  in  Fig.  1 1 ,  it  follows 
that  F{a})  is  a  low-pass  filter  in  the  range  of  frequencies 
examined. 

The  model  predicts  that  the  steepness  of  the  prob¬ 
ability  function  is  proportional  with  the  function  F(cu. ) 
and  inversely  proportional  with  /.  (the  strength  of  the 
internal  noise).  Thus,  unlike  the  transition  amplitude, 
the  steepness  is  biased  by  the  internal  noise  contribution. 
If  the  relative  strength  is  constant  and  independent  of  the 
spatial  frequency  and  amplitude  of  the  patches  of  texture 
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FIGURE  II.  Reciprocal  transitions  p,"'(tu.)  as  a  function  of  spatial  frequency  o>,.  Open  circles  for  Scheme  I;  solid  circles 
for  Scheme  II.  The  vertical  dashed  guide  line  indicates  the  spatial  frequency  of  texture  s:  oi,  4.9  c/deg.  The  horizontal  dashed 
guide  line  indicates  the  reciprocal  amplitude  of  texture  s.  The  solid  line  curve  is  the  mean  of  the  reciprocal  transitions.  In  terms 
of  the  model,  this  curve  shows  the  amplitude  of  the  Fourier  transform  of  the  spatial  filter  F(fo)  of  the  texture  grabber  involved 

(see  equation  (2)]. 
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involved,  the  steepness  is  expected  to  be  pro- 

p>ortionaI  with  F{(o,).  Estimates  of  are  shown  in 
Fig.  8.  The  steepness  shows  a  tendency  to  decrease  with 
increasing  spatial  frequency.  However,  we  find  some 
nonmonotonicity,  in  particular  for  higher  spatial  fre¬ 
quencies.  This  may  reflect  a  certain  variability  of  the 
internal  noise  for  different  spatial  frequencies. 

EXPERIMENT  4:  PERCEIVED  CONTRAST 

We  have  discussed  texture  grabbers  and  motion  en¬ 
ergy  analysis  in  terms  of  objective  amplitude  of  patches 
of  texture.  The  experiments  implied  that  the  activity  of 
the  texture  grabber  increases  monotonically  with  objec¬ 
tive  amplitude  and  decreases  monotonically  with  spatial 
frequency.  An  interesting  question  is  whether  this  re¬ 
lation  is  consistent  with  the  subjective  amplitude  of  static 
grating  contrast  as  a  function  of  spatial  frequency.  In 
other  words,  is  the  activity  of  a  texture  grabber  simply 
proportional  to  the  subjective  amplitude? 

To  answer  this  question,  we  performed  an  amplitude 
discrimination  experiment. 

Method 

In  a  two  interval  presentation  subjects  looked  at  an 
annulus  containing  either  gratings  s  or  v.  In  one  interval 
we  showed  an  annulus  of  gratings  s  (see  frame  fi  of 


Fig.  3),  with  fixed  amplitude  m,  =  0.5  and  fixed  spatial 
frequency  tu,  =  4.9  c/deg.  In  the  other  interval  we 
showed  an  annulus  of  gratings  v  (see  frame  /-  of  Fig.  4). 
with  amplitude  m,  and  spatial  frequency  cu, .  The  order 
of  presentation  of  the  intervals  was  randomized.  Each 
annulus  was  shown  for  133  msec  (which  is  equal  to  the 
frame  display  time  in  the  motion  stimulus).  The  intervals 
were  separated  by  a  time  interval  of  133  msec  in  which 
the  screen  was  uniform  with  background  luminance. 
Apparatus,  viewing  conditions,  and  other  aspects  were 
identical  to  the  motion  experiment. 

Procedure 

The  task  of  the  subject  was  to  indicate  the  interval  that 
contained  the  patches  of  grating  with  the  highest  ampli¬ 
tude.  We  measured  the  probability  Pc(nt,;o},)  that  ob¬ 
servers  judge  the  grating  v  as  the  grating  with  the  highest 
amplitude  as  a  function  of  the  objective  amplitude  of 
grating  v.  In  the  amplitude  matching  experiment,  we 
examined  two  spatial  frequencies;  to,  =  1.2  c/deg,  and 
.  Jv  =  7.4  c/deg  of  grating  v.  These  were  the  lowest  and 
highest  spatial  frequencies  for  which  we  found  transition 
invariance  in  our  modern  experiment.  From  these  prob¬ 
ability  curves,  we  estimated  the  matching  amplitude  of 
grating  v  for  which  the  perceived  amplitude  of  grating  s 
and  V  was  equal.  The  precise  estimation  of  the  matching 
amplitude  was  analogous  to  the  estimation  of  transition 
amplitude  in  the  motion  competition  experiments. 


PW,  v:  1 .2  cpd  PW,  v:  7.4  cpd 


0.0  0.4  0.8  0.0  0.4  0.8 


Amplitude  m  of  v  Amplitude  m  of  v 


JS,  v:  1 .2  cpd  JS,  v:  7.4  cpd 


0.0  .  0.4  0.8  0.0  0.4  0.8 


AmpStude  m  of  V  Ampitude  m  of  v 

FIGURE  12.  Results  of  (he  perceived  amplitude  experiment.  Observers  compared  the  amplitude  of  a  sra(in|-v  (spatial 
Trequency  cu,  and  amplitude  m,)  with  the  amplitude  of  texture  s  (<n, »  0.3.  at, »  4.9  c/deg).  Shown  are  (he  probabilities  F,  for 
judging  (he  amplitude  of  v  higher  than  (hat  of  s  (solid  circles).  The  matching  amplitude  for  texture  v  is  the  crossing  of  the 
curve  with  the  dashed  30%  line.  To  compare  the  matching  amplitude  with  the  transition  amplitude  in  the  motion  experiment, 
we  have  shown  the  probabilities  for  Scheme  I  (open  circles). 
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Results 

In  Fig.  12,  we  show  the  probabilities  of  judging  the 
amplitude  of  grating  v  higher  than  that  of  grating  s  (with 
m,  =»  0.5)  as  a  function  of  objective  amplitude  nt,  (solid 
circles).  For  all  conditions  and  subjects,  the  perceived 
amplitude  of  texture  v  increases  monotonically  with  its 
objective  amplitude  m,.  The  amplitude  m,  where  the 
curve  crosses  the  50%  guide  line  is  the  matching  ampli¬ 
tude.  For  a  “low"  spatial  frequency  grating  v 
(o),  =  1.2  c  deg),  we  find  that  the  perceived  amplitudes  of 
s  and  V  arc  matched  when  m,  =  0.47  for  subject  PW  and 
/»!,  =  0.44  for  JS.  This  matching  amplitude  is  close  to  the 
objective  amplitude  m,  =  0.5  of  grating  s.  For  a  “high” 
spatial  frequency  grating  v  (tu,  =  7.4c;dcg),  the  match¬ 
ing  amplitudes  arc  m,  =  0.54  for  PW  and  \f,  0.53  for  JS. 

The  comparison  of  the  matching  amplitude  with  the 
transition  amplitude  in  the  motion  experiments,  we  have 
also  shown  the  probabilities  to  perceive  heterogeneous 
motion  using  Scheme  I  as  a  function  of  m,  in  the 
corresponding  panels. 

Discussion 

Interestingly,  the  matching  amplitudes  for  low  and 
high  spatial  frequency  gratings  are  approximately  equal 
to  the  objective  amplitude  of  grating  s,  for  the  range  of 
amplitudes  and  spatial  frequencies  of  grating  v  exam¬ 
ined.  That  is,  perceived  amplitude  does  not  depend  on 
spatial  frequency.  However,  the  amplitude  of  grating  v 
for  balancing  the  motion  paths  when  cu,  =  1 .2  c/deg  for 
Scheme  I  was:  m,  =  0.22  for  subject  PW  and  m,  =  0  36 
for  JS.  Obviously,  at  the  transition  amplitude  for  the 
motion  experiment,  the  perceived  amplitude  of  grating  s 
and  V  arc  markedly  different.  That  is,  the  activities  of  the 
grating  v  are  matched  even  when  both  spatial  frequency 
and  perceived  amplitude  are  different  from  grating  s.  In 
conclusion,  activity  cannot  be  a  function  that  depends 
solely  on  perceived  amplitude. 

EXPERIMENT  5:  DICHOPTIC  PRESENTATIONS 

.\fotivation 

We  have  successfully  modeled  the  strength  of  motion- 
from-texture  in  terms  of  a  texture  grabber  followed  by 
motion  energy  analysis.  Motion  energy  analysis  is  a  type 
of  motion  computation  that  is  not  sensitive  to  correspon¬ 
dences  in  textural  features.  An  interesting  property  of 
first-order  motion  energy  analysis  is  that  the  neural 
substrate  for  such  a  process  is  organized  so  as  to  require 
successive  stimulation  to  the  same  eye.  When  monocular 
motion  information  is  not  available  to  the  observer 
first-order  motion  energy  analysis  fails. 

The  motion  system  that  extracts  first-order  motion 
information  of  both  eyes  (when  motion  is  presented 
dichoptically)  has  been  classified  as  a  correspondence- 
channel.  For  example,  Pantle  and  Picciano  (1976)  stud¬ 
ied  apparent  motion  with  a  three-dot  stimulus  and 
reported  element  movement  for  monocular  and  binocu¬ 
lar  presentation,  but  group  movement  for  dkhoptic 
presentation.  The  group  movement  suggests  a  represen¬ 
tation  of  features  or  shapes  precedes  the  extraction  of 


motion.  Also,  Georgeson  and  Shackleton  (1989)  show 
that  drifting  squarewave  gratings  with  missing  funda¬ 
mental  (MF)  moved  backwards  while  presented  mon- 
ocularly  (following  the  third  harmonic)  but  moved 
forwards  when  presented  dichoptically.  They  suggested 
that  the  perceived  direction  of  dichoptic  apparent 
motion  was  consistent  with  a  system  that  combines 
information  across  spatial  frequency  channels  to  identify 
local  features  and  then  tracks  the  location  of  corre¬ 
sponding  features  over  time. 

Generalizing  the  above  reasoning  to  second-order 
motion,  the  motion  mechanism  for  dichoptic  presenta¬ 
tions  of  our  (second-order)  stimuli  would  be  sensitive  to 
the  similarity  of  the  textures  involved.  Thus,  the  contri¬ 
bution  of  what  we  call  correspondence<hannels  might 
be  more  pronounced  when  our  competition  schemes  are 
presented  dichoptically  (sofar  viewing  has  been  binocu¬ 
lar  in  our  experiments).  We  tested  our  energy<hannel 
model  for  motion-from-texture  for  both  dichoptical  and 
monocular  presentations  of  our  motion  stimuli.  This  test 
may  also  locate  the  motion  extraction  process  involved 
in  our  stimuli  in  terms  of  different  levels  in  the  visual 
nervous  system  (before  or  after  the  sites  of  binocular 
combination). 

Results 

The  ambiguous  motion  competition  Schemes  I  and  II 
can  be  presented  dichoptically  in  two  different  modes.  In 
the  first  mode,  the  odd  frames  are  presented  in  one  eye 
and  the  even  frames  in  the  other.  In  this  way,  the 
spatiotemporal  stimulus  is  purely  ambiguous  in  each  eye. 
Both  the  heterogeneous  and  the  homogeneous  paths  are 
processed  by  dichoptic  mechanisms.  In  this  mode,  di¬ 
choptic  mechanisms  are  not  competing  with  monocular 
mechanisms. 

In  the  second  mode,  the  patches  of  one  texture  type 
are  presented  in  one  eye  and  the  patches  of  the  second 
type  of  texture  type  are  presented  in  one  eye  and  the 
patches  of  the  second  type  of  texture  in  the  other  eye.  In 
this  way  the  homogeneous  motion  path  (textures  s  for 
Scheme  I)  is  presented  in  one  eye,  while  the  textures  v  in 
the  other  eye  form  a  purely  ambiguous  stimulus.  In  this 
mode,  dichoptic  mechanisms  processing  the  hetero¬ 
geneous  path  have  to  compete  with  monocular  mechan¬ 
isms  processing  the  homogeneous  path. 

We  determined  the  psychometric  functions  for 
both  competition  schemes  for  a  condition  where  the 
texture  s  and  v  differ  two  octaves  in  spatial  frequency 
(ei),  4.9  c/deg  and  tu,  =  1 .2  c/deg)  for  subject  PW.  The 

binocular  results  were  presented  in  top-left  panel  of 
Fig.  6.  As  discussed  for  Expts  1  and  2,  a  difference 
between  the  transition  amplitudes  |i,  and  indicates  the 
involvement  of  additional  (correspondence)  channels. 
The  results  for  monocular  presentation  were  identical 
(within  measurement  error)  to  the  results  for  binocular 
presentation.  For  both  conditions,  we  find  transition 
invariance:  %  0.2. 

The  results  for  both  modes  of  dichoptic  presentation 
were  very  similar  to  those  for  binocular  presentation. 
That  is,  dichoptic  presentation  yields  psychometric 
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functions  for  Schemes  I  and  II  similar  to  those  for 
binocular  presentation.  For  adequate  amplitude  m,  het¬ 
erogeneous  motion  dominated  homogeneous  motion  for 
both  modes  of  dichoptic  presentation  suggesting  the 
dominance  of  an  energy-channel  even  when  monocular 
motion  information  was  absent.  However,  the  contri¬ 
bution  of  a  correspondence-channel  is  noticeable  for 
dichoptic  presentations;  transition  invariance  no  longer 
holds.  We  found  ~  0.2  and  5: 0. 1  for  both  modes  of 
dichoptic  presentation. 

Discussion 

Motion  perception  between  patches  of  nonsimilar 
texture  is  easily  perceived  for  both  modes  of  dichoptic 
presentation  (as  predicted  by  our  energy-channel).  Even 
in  the  second  mode,  where  a  dichoptic  heterogeneous 
motion  path  competes  with  a  monocular  homogeneous 
path,  heterogeneous  motion  can  easily  dominate  for 
small  amplitude  of  texture  v  (e.g.  m,  >  0.2  for  Scheme  I). 
These  results  suggest  that  dichoptic  processing  of  our 
motion  stimuli  is  dominated  by  the  same  mechanisms  as 
monocular  processing  and  that  motion  strength  is  not 
predicted  by  the  similarity  between  textural  features  such 
as  spatial  frequency. 

However,  although  dichoptic  presentation  leaves  tran¬ 
sition  amplitude  /i,  for  Scheme  I  unaffected,  transition  ftj 
for  Scheme  II  decreases.  This  difference  from  the  binocu¬ 
lar  results  indicates  a  significant  contribution  of  other 
channels  when  monocular  information  for  the  hetero¬ 
geneous  path  is  ambiguous.  A  more  detailed  investi¬ 
gation  might  be  useful. 

GENERAL  DISCUSSION 

Fallacy  of  correspondence  matching 

The  experiments  presented  in  this  paper  provide  co¬ 
gent  evidence  that  texture  similarity  is  not  relevant  to  the 
texture-defined  motion  computation  (within  the  range  of 
spatiotemporal  parameters  varied  in  this  experiment).  As 
an  example  it  was  shown  that  motion  between  patches 
of  texture  that  differ  by  two  octaves  in  spatial  frequency 
and  a  factor  of  2  in  amplitude  can  be  stronger  than 
motion  between  patches  of  identical  texture. 

The  correspondence  matching  metaphor  to  explain 
visual  processes  in  several  visual  dommns  seems  to  have 
lost  predictive  power.  Correspondence  matching  fails  to 
explain  the  dominance  of  (1)  heterogeneous  motion 
paths  composed  of  textures  that  differ  in  spatial  fre¬ 
quency  and  amplitude  (this  paper),  (2)  heterogeneous 
motion  paths  composed  of  elements  that  differ  in  size, 
orientation  and  luminance  (Werkhoven  el  al.,  1990a,  b), 
and  (3)  stereoscopic  matches  between  elements  that 
differ  in  size  and  luminance  (Gulick  &.  Lawson,  1976). 

The  visual  motion  system  does  not  seem  to  be  de¬ 
signed  to  establish  correspondence  between  similar  fea¬ 
tures  in  a  motion  sequence.  This  should  not  come  as  a 
surprise  given  the  inherent  difficulties  in  designing  corre¬ 
spondence  matching  mechanisms.  Such  mechanisms 
would  look  for  “similar  features”  in  “successive”  time 
samples  of  the  spatiotemporal  stimulus.  However,  what 


constitutes  a  feature,  and  how  strict  should  similarity  be 
taken? 

Recently  developed  stimulus  (motion)  energy  models 
for  motion  extraction  bypass  the  correspondence  prob¬ 
lem  and  are  more  likely  candidates  for  the  kind  of  visual 
processing  early  in  the  visual  system  (Adelson  &  Bergen, 
1985;  Heeger,  1992).  The  energy -channel  described  in 
this  paper  is  equivalent  to  such  a  motion  energy  compu¬ 
tation,  applied  to  a  nonlinear  transformation  of  the 
stimulus  (van  Santen  &  Sperling,  1984). 

Contrast  and  motion 

In  Expt  3,  we  showed  that  the  transition  amplitude  of 
texture  v  needed  to  balance  the  motion  path  s,  v  with  the 
motion  path  s,  s  varies  linearly  with  the  amplitude  of 
texture  s.  In  the  context  of  our  model,  this  means  that 
the  activity  of  a  texture  grabber  is  approximately  linear 
in  texture  amplitude.  In  fact,  we  find  linearity  even  for 
high  amplitudes  in  the  range  of  50-100%.  As  a  conse¬ 
quence  of  this  amplitude  linearity,  motion  strength 
varies  linearly  with  the  amplitude  of  each  of  the  texture 
inputs.  That  is,  the  strength  of  motion  between  two 
textures  with  identical  texture  amplitude  is  quadratic 
with  this  amplitude.  Approximate  amplitude  linearity  of 
the  input  lines  for  first-order  motion  energy  analysis  was 
also  found  for  experiments  with  spatiotemporal  modu¬ 
lations  of  luminance  Werkhoven  et  al.  (1990b). 

It  should  be  noted,  that  the  linear  amplitude  depen¬ 
dency  is  at  odds  with  the  j.nplitude  thresholds  for 
motion  direction  discrimination  reported  by  Nakayama 
and  Silverman  (1985).  They  measured  the  smallest  phase 
shift  (yielding  threshold  direction  discrimination  per¬ 
formance)  of  sinusoidal  gratings  as  a  function  of  grating 
amplitude.  The  smallest  phase  shift  yielding  threshold 
performance  leveled  off  for  grating  amplitudes  exceeding 
5%.  They  interpreted  their  finding  in  terms  of  a  ampli¬ 
tude  saturation  function.  However,  their  results  are  open 
to  a  different  interpretation  in  which  the  minimum  phase 
shift  is  limited  by  other  (spatial)  properties  of  the  motion 
extraction  mechanism  leaving  the  amplitude  dependency 
unknown. 

A  shared  motion  analysis  stage? 

An  intriguing  question  is  how  mechanisms  for  the 
extraction  of  motion  carried  by  the  spatiotemporal 
modulation  of  luminance  relate  to  those  for  extracting 
motion  carried  by  the  spatiotemporal  modulation  of 
texture  type.  To  discriminate  both  mechanisms  we  have 
to  compare  the  characteristics  of  the  perception  of  both 
motion  types.  For  example,  Turano  and  Pantle  (1989) 
studied  velocity  discrimination  performance  for  both 
types  of  motion  stimuli  and  showed  similar  discrimi¬ 
nation  characteristics.  Their  results  support  the  hypoth¬ 
esis  of  a  higher  order  (motion  analysis)  mechanism  that 
accepts  input  from  both  the  luminance  domain  as  well 
as  texture  domain. 

A  shared  motion  energy  analysis  stage  for  the  two 
types  of  motion  is  also  supported  by  our  finding  that 
strength  of  motion-from-texture  is  ruled  by  the  same 
metric  as  motion  in  the  luminance  domain.  Motion 
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Strength  is  the  covariance  (or  product)  of  local  activities. 
This  activity  is  simply  the  luminance  itself  when  the 
motion  is  carried  by  luminance  (van  Santen  &.  Sperling, 
1984)  or  a  nonlinear  transformation  of  the  luminance 
pattern  for  motion-from-texture  (this  paper). 

In  conclusion,  the  extraction  of  motion  from  the 
spatiotemporal  modulations  of  luminance  and  that  of 
texture  types  seems  to  be  mediated  by  a  shared  motion 
energy  analysis  stage.  However,  additional  expenments 
with  different  paradigms  may  weaken  this  idea.  For 
example,  Mather  (1991)  showed  that  both  motion  types 
produce  motion  after  effects,  but  that  the  duration  of  the 
aftereffects  were  significantly  different. 

Transitivity  and  additivity 

Under  the  assumption  of  energy  channels  and  channel 
summation,  the  transition  invariance  of  a  pair  of  tex¬ 
tures  s  and  v  implies  that  s  and  v  are  (texture-defined) 
motion  metamers.  That  is.  all  such  textures  v  in  this 
metameric  class  yield  identical  motion  strength  when 
embedded  in  a  motion  path  s,  v. 

Metamery  yields  two  strong  predictions.  First, 
metamery  predicts  transitivity:  if  textures  a  and  b  are 
metameric  with  s,  then  a  is  metameric  with  b.  Second, 
metamery  predicts  additivity:  if  textures  a  and  b  are 
metameric  with  s.  then  any  linear  combination  xa  +  )?b 
(with  a  -t-  /J  =  1)  is  metameric  with  s. 

These  predictions  have  not  yet  been  tested. 

Motion  transparency 

The  energy-channel  proposed  in  this  paper  computes 
the  difference  between  left-  and  rightward  motion.  This 
implies  that  motion  transparency  (the  simultaneous  de¬ 
tection  of  left-  and  rightward  motion)  is  not  readily 
accommodated  in  this  model.  Because  the  motion  analy¬ 
sis  component  of  the  cnergy<hannel  is  a  Reichardt- 
correlator,  the  motion  energy  of  the  left-  and  rightward 
motion  path  are  no  explicit  intermediate  results).  How¬ 
ever,  occasionally,  observers  reported  transparency  for 
stimuli  that  were  nearly  balanced. 

Adelson  and  Bergen  (1985)  addressed  this  issue  by 
pointing  out  that  although  their  energy  detector  was 
functionally  equivalent  to  correlation  detector,  the  inter¬ 
mediate  results  are  not.  Specifically,  the  energy  of  left 
and  rightward  motion  are  explicit  intermediate  results  in 
energy  detectors,  but  not  in  correlation  detectors  (the 
output  of  a  half  Reichardt-correlation  is  the  half-phase 
opponent  energy!).  Although  our  conclusions  do  not 
depend  on  the  specific  choice  of  motion  model,  a  further 
study  of  transparency  in  this  context  might  reveal  the 
specific  type  of  detector  involved. 

Extension  of  the  parameter  space 

It  is  important  to  remember  that  we  have  shown  the 
one-dimensionality  of  the  motion-from-texture  compu¬ 
tation  only  with  respect  to  parallel  sinewave  patches  that 
differ  in  spatial  frequency  and  amplitude.  Chubb  and 
Sperling  ( 1991 )  found  that  motion-from-texture  could  be 


carried  by  dififerences  in  spatial  orientation,  although 
differences  in  orientation  did  not  produce  as  vigorous 
motion  as  did  differences  in  spatial  frequency.  This 
observation  indicates  that  onentation  (and  possibh 
other  properties)  are  relevant  to  motion-from-texture  It 
would  be  interesting  to  determine  the  dimensionality  of 
the  computation  for  a  larger  class  of  stimuli. 

Although  motion  strength  at  a  “frame  time"  r  of 
8/60  sec  is  exclusively  determined  by  the  product  of 
activities,  we  can  not  exclude  that  effects  of  texture 
similarity  are  stronger  at  longer  frame  time.  In  fact,  the 
temporal  frequency  of  texture  modulation  in  our  exper¬ 
iments  is  1.9  Hz  (one  cycle  consists  of  four  frames  c 
133  msec  each).  At  slower  temporal  frequencies,  the 
processing  time  for  the  textures  increases,  perhaps  en¬ 
abling  more  elaborate  “texture  grabber"  filters  or  corre- 
spondence<hannels  to  contribute  to  motion  strength. 

Effects  of  other  properties  (e.g.  orientation)  and  tem¬ 
poral  parameters  are  currently  under  investigation. 
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APPENDIX 

Multiple  Energy -Channels  and  Transition  Invariance 
A  system  of  multiple  energy -channels 

We  propose  a  multi-channel  model  (multiple  energy-channels)  for 
computing  the  strength  of  motion-from-texture.  The  model  consists  of 
two  stages,  as  shown  in  Fig.  13. 

Stimulus  transformation:  texture  grabbers.  Stage  I  consists  of  n  types 
of  texture  grabbers — where  each  type  of  texture  grabber  i  is  described 
by  nonlinear  spatiotemporai  transformations  T,,  i  -  I ..  .n,  of  the 
optical  input.  Each  transformation  yields  a  spatiotemporai  function 
T,{<p,t)  whose  value  reflects  the  local  texture  preferences  of  the 
Stage  I  filters  in  the  visual  field  as  a  function  of  position  ip  and  time 
r.  (We  use  ip  for  position  because,  in  our  essentially  one-dimensiorul 
stimulus,  the  texture  position  is  determined  by  the  angle  <p.)  The  output 
of  these  texture  grabben  is  called  activity.  The  n  diflerent  transform¬ 
ations  T,  of  Suge  1  transform  the  optical  input  into  n  activity 
representations. 

Motion  detection.  Stage  2  is  a  set  of  motion  detectors.  For  specificity, 
but  without  loss  of  generality  (see  van  Santen  ft  Sperling,  1984;  Chubb 
ft  Sperling,  1988, 1991)  we  adopt  Reichardt's  scheme  for  sundard 
motion  analysis  (Reichardt.  1961)  which  consists  of  two  oppositely 
tuned  coincidence  detectors.  Motion  detectors  operate  on  the  outputs 
of  the  texture  grabbers.  Each  type  of  texture  grabber  (transformation 
T,)  has  its  own,  unique  set  of  motion  detectors.  A  transformation  T, 
together  with  its  motion  detectors  is  called  a  motion  channel  i. 

A  coincidence  detector  performs  a  multiplication  operation  on 
the  current  activity  Tfip,  t)  at  position  ip  at  time  /  and  the  (de¬ 
layed)  activity  Tfv  —  \  tp,  t  —  St)  at  position  tp—Sip  and  time 
r  —  St.  Hence,  the  output  of  the  coincidence  detector  is: 
T,{<p  —  S<p,  t  —  St)T,[ip,  I).  The  outputs  of  two  coincidence  detectors 


Transformation 


Motion 

Energy 

Analysis 


Summation 


T)  T2  Tj  • •  ■  Tn 

I  I  I  I  I 

M  M  M  ...  M 


FIGURE  13.  A  motion  compuution  consisting  of  multiple  energy- 
channels.  The  first  stage  consists  of  n  independent  transformations  T, 
(the  texture  grabbers).  Transformation  T,  is  a  nonlinear  tranformation 
(e.g.  spatial  filtering  followed  by  rectification).  The  output  of  each 
transformation  is  called  an  activity  representation  of  the  optical  input. 
Motion  energy  analysis  (M)  is  ap^ed  to  each  of  the  activity  represen- 
utions  of  the  input.  Finally  the  motion  strength  is  summed  across  the 
different  channels. 
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tuned  to  identicaJ  velocitiee  but  opposite  directions  tie  subtracted  to 
yield  a  net  motion  strength  0,(^,  ly. 

0  -  -  A<i»,  /  -  A/)r.(»,  /)  -  TX<p  ~  i)TX9.  <  -  A/). 

(14) 

Channel  i  has  a  positive  output  for  motion  in  the  direction  of 
positive  9  and  a  negative  output  for  motion  in  the  opposite  direction. 

Summation.  In  a  one-dimensional  motion  computation,  the  outputs 
of  a  system  of  energy-channels  described  above  (represented  in  an  n 
dimensional  channel  space)  are  essentially  mapped  to  a  single  (de¬ 
cision)  dimension:  the  final  net  motion  strength.  This  mapping  maps 
an  (n  —  l)-dimensional  manifold  in  the  channel  space  to  a  single  point 
in  the  one-dimensional  decision  space  (final  motion  strength).  For 
example,  channel  summation  maps  a  planar  surface  in  the  channel 
space  to  zero  final  motion  strength  (for  Scheme  1).  For  other  combi¬ 
nation  rules  than  summation,  other  (nonplanar)  surfaces  will  map  to 
zero  final  motion  strength.  However,  when  we  assume  that  this 
mapping  is  continuous  and  differentiable,  these  true  manifolds  are  in 
first  order  approximated  by  a  planar  surface  for  small  channel  signals 
at  transition  poinu.  Channel  summation  is  a  sufficient  first-order 
combination  rule. 

Summation  of  channels  D,  yields  net  motion  strength  D: 


f>(9,  t)-  X  0|(<l?>0. 


(15) 


Prtdietions  for  competition  schemes 

We  apply  the  multi-channel  computation  to  competition  Schemes  I 
and  II  (see  Figs  3  and  4).  Consider  first  Scheme  I.  The  hetero¬ 
geneous  path  is  the  motion  between  texture  s  (at  time  r  —  At  and 
position  9  -  A9)  and  texture  v  (at  time  (  and  position  9).  Let  be 
the  activity  of  texture  grabber  T,  for  texture  s,  and  7^.,  the  activity 
of  texture  grabber  T,  for  texture  v.  The  output  of  channel  I  for  this 
path  is  the  product  of  the  delayed  activity  of  texture  s  and  the 
current  activity  T„  of  texture  v.  For  simplicity,  we  will  use  the  ve«or 
notation; 


fr.l 

7.- 

si*  • 

and  7,  - 

.7;.. 

The  vectors  T,  and  f,  are  the  activity  vectors  of  textures  s  and  v 
respectively.  An  activity  vector  represents  the  activity  of  a  texture  in 
the  n -dimensional  transformation  space  (T-space)  defined  by  trans¬ 
formations  T,  •  •  -T,. 

For  Scheme  I,  the  motion  strengths  summed  over  all  channels 
for  the  heterogeneous  path  can  be  written  as  the  vector  product: 

Sis.-?;  ?',- t  (17) 

We  have  arbitrarily  assigned  a  positive  sign  to  motion  strength  in  this 
direction.  Motion  in  the  opposite  direction  has  a  negative  sgn  [see 
equation  (14)].  The  output  of  channel  i  for  the  homogeneous  path 
(between  textures  s)  is  the  squared  output  of  transformation  Tu-  The 
motion  strength  ^  of  the  homogeneous  path  is  (after  summing  all 
channels)  is: 

(18) 

Adding  equations  (6)  and  (7)  gives  the  net  motion  strength  D,  in  the 
direction  of  the  heterogeneous  path  for  Scheme  I: 

(19) 

Analogously,  the  net  motion  strength  Oj  in  the  direction  of  the 
heterogeneous  path  for  Scheme  n  is: 


(a)  (b) 

FIGURE  14.  Solutions  for  transitions  (path  equality)  in  a  two-dimen¬ 
sional  T-space.  Each  texture  in  a  motion  path  is  processed  by  different 
texture  grabbers.  Vector  T,  represents  the  activity  of  texture  v  in 
T-space,  vector  T,  that  of  s.  The  collection  of  activity  vectors  T,  that 
satisfy  the  constraints  for  path  equality  are  given  by  the  thin  line  in  (ai 
for  Scheme  I  and  by  a  thin  circle  in  (b)  for  Scheme  II. 


Transitions:  Scheme  I 

At  a  transition  for  ^heme  I,  the  net  motion  strength  D,  is  zero 
D,~T,  (T,-T.)~0.  Cb 

There  exists  an  (n  -  l)-dimensional  plane  of  T,  vectors  in  T-space 
for  which  the  motion  strength  of  the  heterogeneous  and  homogeneous 
motion  paths  are  balanced  (the  vectors  7*.  for  which  the  difference 
vector  7*,  -  T*,  ate  orthogonal  to  vector  7',). 

Gsnsider,  for  example,  a  two-dimensional  T-space  (a  two-channel 
motion  compuution).  The  vectors  T,  in  T-space  that  satisfy  equation 
(21)  for  a  certain  vector  7,  must  end  on  the  thin  guide  line  in  Fig.  I4(ai. 

It  should  be  noted  in  passing,  that  the  net  heterogeneous  motion 
strength  O, «  7,  ■  (7,  -  7,)  can  be  positive.  Hence,  even  in  a  multi¬ 
channel  computation,  the  strength  of  the  heterogeneous  motion  path 
can  dominate. 


Transitions:  Scheme  U 

Similarly,  at  a  transition  for  Scheme  II  (Fig.  4).  the  net  motion 
strength  D]  is  zero: 


/),-7,(7.-7,)-0.  (22) 

The  (n  —  l)-dimensional  solution  of  7,  vectors  in  T-space  for  which 
the  motion  strength  of  the  heterogeneous  and  homogeneous  motion 
paths  are  balanced  is  not  a  plane.  For  example,  we  consider  again  the 
two-dimensional  T-space.  The  vectors  7,  in  T-space  that  satisfy 
equation  (22)  for  a  certain  vector  7,  end  on  a  circle  conuining  7,  (see 
Fig.  I4(b)l. 


Transition  inoariance 

Using  only  the  result  for  Scheme  L  we  cannot  discriminate  between 
a  single-channel  (ii  •  I)  and  multi<hannel  computations  (n  >  i  1. 
either  single-  or  multi-chaiuiel  computations  might  yield  solutions  to 
equation  (21).  To  resolve  the  issue,  we  need  the  constraint  of  transition 
invariance. 

Transition  invariance  means  that  once  the  motion  strength  of  the 
heterogeneous  path  and  that  of  the  homogeneous  motion  path  are 
balanced  for  a  particular  pair  of  textures  s  and  v  for  Scheme  I.  tlus 
balance  is  not  disturbed  by  interchanging  the  textures  s  and  v  (yielding 
Scheme  II).  We  now  show  that  transition  invariance  is  inconsistent 
with  a  multi-channel  computation. 

The  transitions  are  invariant  if  the  activity  vector  7,  simuluneously 
satisfies  equations  (21)  and  (22).  Because  the  difference  vector  7,  -  7, 
is  always  in  the  plane  defiiMd  by  vector  7,  and  vector  7„  the  only 
vector  7,  that  satisfies  both  equations  is  7,-7,. 

Vector  7,  is  equal  to  vector  7,  if  each  transformation  T,  involved 
in  the  motion  compuution  has  an  equal  output  for  both  textures  v 
and  s: 


(20) 


7u-7,,  (l-I  -It). 


(23) 
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Equation  (23)  repreients  a  very  strong  constraint  for  Use  ensemble 
of  transformations  that  might  be  involved  in  a  multi-channel  com- 
puution.  Every  transformation  r,  must  have  an  isoactiviiy  contour 
as  a  fusion  of  all  textural  properties  (e.g.  frequency-amplitude 
Space)  that  contains  both  the  activity  of  texture  s  and  that  of 
texture  v.  Furthermore,  transition  invariance  holds  for  different 


texture  pairs  (s.  v);  the  iso-activity  contours  of  each  transformation 
^  must  be  identical  for  all  these  pairs.  Transformauons  that  are 
Identical  at  arbitrarUy  many  observable  points,  are  identical  in  the 
range  of  observable  points.  To  say  that  all  T,  are  identical  is  equivalent 
to  saying  that  there  is  only  one  T,.  that  is,  the  T-space  is  one-dimen- 
sional. 


Using  Repetition  Detection  to  Define 
and  Localize  the  Processes  of 
Selective  Attention 
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activity,  /t. /,  assigns  I  to  the  points  in  the  leftmost  eighth  of  and  -  I  to  the  •<>  «'cc«  octween  tne  vertical  ana  nonzoniai  gratings  oi  ngs.  5t>i  ana  5bZ.  ii  me 

points  in  the  right  seven-eighths.  The  timeblock  pictures  /,  through  /,  continue  to  function  of  Fig.  5a  is  1  at  a  certain  point  in  space-time,  the  corresponding  point  in 

shift  the  vertical  edge  rightward  through  /<  until,  in  picture  8,  /I  is  uniformly  I.  F'g  5c  is  assigned  the  value  of  the  corresponding  point  in  Fig.  5bl;  otherwise  the 

Multiplying  each  timeblock  picture  i»l,2 . 9  by  its  associated  random  variable  point  in  Fig.  5c  is  assigned  the  value  of  the  corresponding  point  in  Fig.  5b2. 

4t  yields,  in  this  particular  realization,  the  stimulus  given  in  Fig.  4b.  Although  Figs.  5c  and  5d  look  similar,  they  dilfcr  in  an  important  respect:  the 
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Fig.  3.  Fourier  and  noiiFourier  motion  mechanisms,  (a)  Fourier  motion  mechanisms  apply  standard 
motion-analysis  directly  to  the  luminance  signal  L.  0>.  c,  d)  NonFourier  mechanisms  apply  standard 
motion  analysis  to  a  nonlinear  transformation  of  luminance,  (b)  A  simple  nonFourier  mechanism  q>plies  a 
signal  transformation  comprised  of  a  spatiotemporal  linear  filter,  followed  by  a  pointwise  nonlinearity.  The 
* ’s  indicate  spatial  and  temporal  convolution,  respectively,  and  •  indicates  multiplication.  The  filtering 
performed  in  (b)  is  roughly  pointwise  in  time  (the  tempoal  impulse  response  b2  apisoximates  an  impulse), 
and  the  nonlinearity  implied  is  a  full-wave  rectifier.  This  system  (with  iq)propriately  chosen  spatial  filter, 
bl)  will  extract  the  motion  of  the  texture  quilts  shown  in  Figs.  4b,  Sd,  and  6d.  It  will  not  extract  the 
motion  of  stimulus  /,  the  traveling  contrast-reversal  of  the  random  vertical  bar  pattern  shown  in  Fig.  2a. 
(c)  A  spatially  pointwise  (the  spatial  impulse  response  cl  q)proximaies  an  impulse),  system  with  a  flicker- 
sensitive  temp^  filter  and  a  full-wave  rectifier.  Because  of  the  flicker  sensitivity,  this  mechanism  will 
extract  the  motion  of  the  traveling  contrast-reversal  of  the  random  vertical  bar  pattern  shown  in  Fig.  2a  but 
not  the  motion  of  the  texture  quilts  shown  in  Rgs.  4b,  Sd,  6c,  and  fid.  (d)  The  temporal  filter  d2  averages 
the  temporal  filters  b2  and  c2,  and  the  pointwise  nonlinearity  is  a  full-wave  rectifier.  With  an  appropriate 
qtatial  filter  dl,  ihs  nonFourier  system  extracts  the  motion  of  any  corresponding  texture  quilt  as  well  as  the 
motion  (rf'  the  traveling  contrast-reversal  of  the  random  vertical  bar  pattern  shown  in  Fig.  2a.  However,  it 
would  be  less  well-suited  to  these  tasks  than  tiie  detectors  shown  in  (b)  and  (c)  whose  temporal  filters  it 
averages. 


Stimulus  Duration  (msec) 


Rg.  8.  The  percent  of  correct  direction-of-motion  judgments  to  the  F-quilt,  the  O-quilt,  md  the  E^uilt  as 
a  function  of  stimulus  duradon.  The  panels  show  data  tot  subjects  CC  and  GA,  leqtectively.  Each  d^ 
point  is  the  mean  of  100  judgments.  (Squares)  F-quilt;  (triangles)  O-quilU  (circles)  ^uilt  The  stimuli 
durations  of  133,  266. 400.  and  533  ffls,  cratespond  to  stimulus  presentations  of  0.5. 1. 1.5  and  2  quilt 
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OBJECT  SPATIAL  FREQUENCIES.  RETINAL  SPATIAL 
FREQUENCIES,  NOISE,  AND  THE  EFFICIENCY  OF 
LETTER  DISCRIMINATION 

David  H.  Paiush  and  George  Sperunc* 

Human  Infonnation  Procesaiof  Laboratory,  Department  of  Psychology  and  Center  for  Neural  Sdenoes. 

New  York  University,  NY  10003,  U.S.A. 

(JUctietd  7  July  198S;  in  rtvisedfom  2  June  1990) 

Ahatract — To  determine  which  spatial  frequencies  are  most  effective  for  letter  identification.  ^  whether 
this  is  because  letters  ate  objectively  mote  discriminabte  in  these  frequency  bands  or  becauscAn  utiliae 
the  infonnation  more  eflidently,  we  studied  the  26  upper-case  letters  of  English.  Six  two-octave  wide  filters 
were  used  to  produce  spatially  filtered  letters  with  20-mean  frequencies  ranging  from  0.4  to  20  cycles  per 
letter  height.  Subjects  attempted  to  identify  filtered  letters  in  the  presence  of  identically  filtered,  added 
Gaussian  noise.  Hie  percent  of  correct  letter  identifications  vs  sin  (the  root-mean-squarc  ratio  of  signal 
to  noise  power)  was  determined  for  each  band  at  four  viewing  distances  ranging  over  32:1.  Object  spatial 
frequency  band  and  s/n  detennine  prtsenet  of  m/ormation  in  the  stimulus;  viewing  distance  determines 
retinal  spatial  frequency,  and  affects  only  abUiiy  to  utilize.  Viewing  distance  had  no  effect  upon  letter 
discriminability;  object  spatial  frequency,  not  retinal  spatial  frequency,  determined  discriminability.  To 
determine  discrimination  efficiency,  we  compared  huiiMn  discrimination  to  an  ideal  discriminator.  For  our 
two-octave  wide  bands,  s/n  performance  of  humans  and  of  the  ideal  detector  improved  with  frequency 
mainly  because  linear  bandwidth  increased  as  a  function  of  frequency.  Relative  to  the  ideal  detector, 
human  efficiency  was  0  in  the  lowest  frequency  bands,  reached  a  maximum  of  0.42  at  1 .5  cycles  per  object 
and  dropped  to  about  0.104  in  the  highest  band.  Thus,  our  subjects  best  extract  upper-case  letter 
information  from  spatial  frequencies  of  I.S  cycles  per  object  height,  and  they  can  extraa  it  with  equal 
efficiency  over  a  32;  I  range  of  retinal  frequencies,  from  0.074  to  more  than  2.3  cycles  per  degree  of  visual 
angle. 

Spatial  filtering  Scale  invariance  Psychophysics  Contrast  sensitivity  Acuity 


INTRODUCTION 

Characterizing  objects 

When  we  view  objects,  what  range  of  spatial 
frequencies  is  critical  for  recognition,  and  how 
is  our  visual  system  adapted  to  perceive  these 
frequencies?  Ginsburg  (1978,  1980)  was  among 
the  first  to  investigate  tUs  problem  by  means  of 
spatial  bandpass  filtered  images  of  faces  and 
lowpass  filtered  images  of  letten.  He  noted  the 
lowest  frequency  band  for  faces  and  the  cutoff 
frequency  for  letters  at  which  the  images  seemed 
to  him  to  be  clearly  recognizable.  The  cutoff 
frequency  for  letters  was  1-2  cycles  per  letter 
width;  faces  were  best  recognized  in  a  band 
centered  at  4  cycles  per  face  width.  He  also 
proposed  that  the  perception  of  geometric  visual 
illusions,  such  as  the  Mueller-Lyer  and  Poggen- 
dorf,  was  mediated  by  low  spatial  frequencies 
(Ginsberg,  1971,  1978;  Ginsberg  &  Evans, 
1979). 


*To  whom  reprint  requests  should  be  addressed. 


An  issue  that  is  related  to  the  lowest  fre¬ 
quency  band  that  suffices  for  recognition  is  the 
encoding  economy  of  a  band.  For  a  filter  with 
a  bandwidth  that  is  proportional  to  frequency 
(e.g.  a  two-octave-wide  filter),  the  lower  the 
frequency,  the  smaller  the  number  of  frequency 
components  needed  to  encode  the  filtered  image 
of  a  constant  object.  Combining  these  two 
notions,  Ginsburg  concluded  that  objects  were 
best,  or  most  efficiently,  characterized  by  the 
lowest  band  of  spatial  frequencies  that  sufficed 
to  discriminate  them.  Ginsburg  (1980)  went  on 
to  suggest  that  higher  spatial  frequencies  were 
redundant  for  certain  tasks,  such  as  face  or 
tetter  recognition. 

Several  investigators  were  quick  to  point  out 
that  objects  can  be  well  discriminated  in  various 
spatial  frequency  bands.  Fiorentini,  Maffei  and 
&ndini  (1983)  observed  that  faces  were  well 
recognized  in  either  high  or  in  lowpass  filtered 
bands.  Norman  and  Erlich  (1987)  observed  that 
high  spatial  frequencies  were  essential  for  dis¬ 
crimination  between  toy  tanks  in  photographs. 
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With  respect  to  geometric  illusions,  both  Janez 
(1984)  and  Carlson,  Moeller  and  Anderson 
(h984)  observed  that  the  geometric  illusions 
could  be  perceived  for  images  that  had  been 
highpass  filtered  so  that  they  contained  no 
low  spatial  frequencies.  This  suggests  that  low 
and  high  spatial  frequency  bands  may  carry 
equivalently  useful  information  for  higher  visual 
processes. 

Characterizing  the  visual  system 

In  the  studies  cited  above,  the  discussion  of 
spatial  filtering  focuses  on  object  spatial  fre¬ 
quencies,  that  .is,  frequencies  that  are  defined  in 
terms  of  some  dimension  of  the  object  they 
describe  (cycles  per  object).  Most  psychophysi¬ 
cal  research  ivith  spatial  frequency  band$  has 
focused  on  retinal  spatial  frequencies,  that  is, 
frequencies  defined  in  terms  of  retinal  coordi¬ 
nates.  For  example,  the  spatial  contrast  sensi¬ 
tivity  function  (Davidson,  1968;  Campbell  & 
Robson,  1968)  describes  the  threshold  sensi¬ 
tivity  of  the  visual  system  to  sine  wave  gratings 
as  a  function  of  their  retinal  spatial  frequency. 
Visual  system  sensitivity  is  greatest  at  3-10 
cycles  per  degree  of  visual  angle  (c/deg).  How 
does  visual  system  sensitivity  relate  to  object 
spatial  frequencies? 

Vnconfounding  retinal  and  object  spatial 
frequencies 

Retinal  spatial  frequency  and  object  spatial 
frequency  can  be  varied  independently  to  deter¬ 
mine  whether  certain  object  frequencies  are  best 
perceived  at  particular  retinal  frequencies.  Ob¬ 
ject  frequency  is  manipulated  by  varying  the 
frequency  band  of  bandpass  filtered  images; 
retinal  frequency  is  manipulated  by  varying  the 
viewing  distance. 

The  cutoff  object  spatial  frequency  of  lowpass 
filters  and  the  observer’s  viewing  distance  were 
varied  independently  by  Legge,  Pelli,  Rubin  and 
Schleske  (1985)  who  studied  reading  rate  of 
filtered  text  at  viewing  distances  over  a  133:1 
range.  Over  about  a  6:1  middle  range  of  dis¬ 
tances,  reading  rate  was  perfectly  constant,  and 
it  was  approximately  constant  over  a  30:1 
range.  At  the  longest  viewing  distances,  there 
was  a  sharp  performance  decrease  (as  the 
letters  became  indiscriminably  small).  At  the 
shortest  viewing  distance,  performance  de¬ 
creased  slightly,  perhaps  due  to  large  eye  move¬ 
ments  that  the  subjects  would  have  to  execute 
to  bring  relevant  material  towards  their  lines  of 


sight,  and  to  the  impossibility  of  peripherally 
previewing  new  text. 

While  viewing  distance  changed  the  overall 
level  of  performance  in  Legge  et  al.,  the  cutoff 
object  frequency  of  their  low-pass  filters  at 
which  performance  asymptoted  did  not  change. 
From  this  study,  we  learn  that  reading  rate  can 
be  quite  independent  of  retinal  frequency  over  a 
fairly  wide  range,  and  that  dependence  on  criti¬ 
cal  object  frequency  does  not  depend  on  viewing 
distance.  Because  the  authors  measured  reading 
rate  only  in  lowpass  filtered  images,  we  cannot 
infer  reading  performance  in  higher  spatial  fre¬ 
quency  bands  from  their  data. 

Unconfounding  object  statistics  and  visual  system 
properties 

Human  visual  performance  is  the  result  of  the 
combined  effects  of  the  objectively  available 
information  in  the  stimulus,  and  the  ability  of 
humans  to  utilize  the  information.  In  studying 
visual  performance  with  differently  filtered  im¬ 
ages,  it  it  critical  to  separate  availability  from 
ability  to  utilize.  For  example,  narrow-band 
images  can  be  completely  described  in  terms 
of  a  small  number  of  parameters — Fourier 
coefficients  or  any  other  independent  descrip¬ 
tors — than  wide-band  images.  Poor  human 
performance  with  narrow-band  images  may 
reflect  the  impoverished  image  rather  than 
an  intrinsically  human  characteristic — an  ideal 
observer  would  exhibit  a  similar  loss. 

The  problem  of  assessing  the  utility  of  stimu¬ 
lus  information  becomes  acute  in  comparing 
human  performance  in  high  and  in  low  fre¬ 
quency  bandpass  filtered  images.  Typically, 
filters  are  constructed  to  have  a  bandwidth 
proportional  to  frequency  (constant  bandwidth 
in  terms  of  octaves).  For  example,  Ginsburg 
(1980)  used  faces  filtered  into  2-octave-wide 
bands;  while  Norman  and  Ehrlich  (1987)  also 
used  2-octave  bands  for  their  filtered  tank  pic¬ 
tures.  With  such  filters,  high  spatial  frequency 
images  contain  more  independent  frequencies 
than  low  frequency  images. 

Although  linear  bandwidth  represents  per¬ 
haps  the  important  difference  between  images 
filtered  in  octave  bands  at  different  frequencies, 
the  informational  content  of  the  various  bands 
also  depends  critically  on  the  nature  of  the 
specific  class  of  objects,  such  as  faces  or  letter. 
Obviously,  determining  the  information  content 
of  images  is  a  difficult  problem.  When  it  is  not 
solved,  the  amount  of  stimulus  information 
available  within  a  frequency  band  is  confounded 
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with  the  ability  of  human  observers  to  use  the 
information.  Direct  comparisons  of  perform¬ 
ance  between  differently  filtered  objects  are 
inappropriate.  This  distinction  between  objec¬ 
tively  available  stimulus  information  and  the 
human  ability  to  use  it  has  not  been  adequately 
posed  in  the  context  of  spatial  bandpass 
filtering. 

Efficiency 

In  the  present  context,  physically  available 
information  is  best  characterized  by  the  per¬ 
formance  of  an  ideal  observer.  If  there  were  no 
noise  in  the  stimulus,  the  ideal  observer  would 
invariably  respond  perfectly.  To  compare  the 
performance  of  an  observer,  human  or  ideal, 
noise  of  root-mean-square  (r.m.s.)  amplitude  n 
is  progressively  added  to  the  signal  of  r.m.s. 
amplitude  s  until  the  performance  is  reduced  to 
some  criterion,  such  as  S0%  correct  in  a  letter 
identification  task.  This  defines  the  signal  to 
noise  ratio,  (s/n),,  for  a  criterion  c.  Efficiency  eff 
of  human  performance  is  defined  by: 


where  h  and  i  indicate  human  and  ideal  observ¬ 
ers,  and  s  and  n  are  r.m.s.  signal  and  noise 
amplitudes  (Tanner  &  Birdsall,  19S8).  In  a  piu-e, 
quantally  limited  system,  efficiency  actually 
represents  the  fraction  of  quanta  absorbed 
(utilization  efficiency).  In  the  context  of  signal 
detection  theory,  efficiency  is  given  by  a  d'  ratio: 

eff={dlld:f. 

Overview 

For  an  object  that  contains  a  broad  spectrum 
of  spatial  frequencies,  object  spatial  frequency  is 
determined  by  the  center  frequency  of  a  spatial 
bandpass  filtered  image.  Retinal  spatial  fre¬ 
quency  is  determined  by  the  viewing  distance  at 
which  the  stimulus  is  viewed.  Stimulus  infor¬ 
mation  is  determined  jointly  by  the  signal-to- 
noise  ratio,  by  the  spatial  filtering,  and  by  the 
characteristics  of  the  set  of  signals;  these  three 
informational  components  are  combined  in  the 
efficiency  computation.  Letters  are  a  convenient 
stimulus  to  study  because  they  are  highly  over- 
learned  so  that  human  performance  can  be 
expected  to  be  reasonably  efficient,  and  because 
much  is  already  known  about  the  visibility  of 
letters  in  the  presence  of  internal  noise  (letter 
acuity)  and  about  the  visual  processing  of 
letters. 


Specifically,  to  determine  the  roles  of  object 
and  retinal  spatial  frequencies,  letters  are 
filtered  into  various  frequency  bands.  Noise  is 
added,  and  the  psychometric  function  for  cor¬ 
rect  identification  is  determined  as  a  function 
of  s/n.  Accuracy  .depends  only  on  sfn  and  not  on 
overall  contrast,  for  a  wide  range  of  contrasts 
(Pavel,  Sperling,  Riedl  &.  Vanderbeck,  1987). 
This  determination  is  repeated  for  every  combi¬ 
nation  of  object  frequency  band  and  viewing 
distance.  Thereby,  retinal  spatial  frequency 
and  object  spatial  frequency  are  unconfounded, 
enabling  us  to  determine  whether  a  particular 
object  frequency  band  is  better  discriminated 
in  one  visual  channel  (retinal  frequency)  than 
any  other  (Parish  &  Sperling,  1987a,  b).  More¬ 
over,  by  computing  an  ideal  observer  for  the 
identification  task,  we  obtain  an  objective 
measure  of  the  information  that  is  present  in 
each  of  the  frequency  bands.  Finally,  the  com¬ 
parison  of  human  performance  with  the  per¬ 
formance  of  the  ideal  observer  gives  us  a  precise 
measure  of  the  ability  of  our  subjecu  to  utilize 
the  information  in  the  stimulus.  Having 
untangled  these  factors,  we  can  determine  which 
spatial  frequencies  most  efiiciently  characterize 
letters  for  identification. 

METHOD 

Two  experiments  were  conducted  using  simi¬ 
lar  stimuli  and  procedures. 

Stimuli 

Letters  (signals)  and  noise.  The  original, 
unfiltered  letters  were  selected  from  a  simple 
5x7  upper-case  font  commonly  used  on  CRT 
terminals.  Since  this  is  an  experiment  in  pattern 
recognition,  we  felt  that  the  simplest  letter  pat¬ 
tern  might  be  the  most  general;  indeed,  this  font 
has  been  widely  used  in  letter  discrimination 
studies.  For  the  purpose  of  subsequent  spatial 
filtering,  the  letters  were  redefined  on  a  pixel 
grid  that  measured  45  (vertical  height)  x  35 
(maximum  horizontal  extent  of  letters  M  and 
W).  The  letters  had  value  I  (white);  the  back¬ 
ground  had  value  0  (black).  To  avoid  edge 
effects  in  filtering,  the  background  was  extended 
to  128  X  128  pixels  for  all  computations.  How¬ 
ever,  only  the  center  90  x  90  pixels  of  the  stimu¬ 
lus  were  displayed,  as  these  contained  effectively 
all  the  usable  stimulus  information,  even  for 
low  spatial-frequency  stimuli.  Letters  for  pres¬ 
entation  were  chosen  pseudo-randomly  from 
the  set  of  26  upper-case  English  letters.  Noise 
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Tabk  I .  Parameters  of  the  bandpass  fillers;  lower  and  uf^r 
half-amplitude  frequencies,  peak,  and  2D  mean  frequencies 
in  cycles/letter  height 


Band 

Lower 

Peak 

Upper 

Mean* 

0 

0 

Lowpass 

0.53 

0.39 

1 

0.26 

0.53 

1.05 

0.74 

2 

0.53 

1.05 

2.11 

1.49 

3 

1.05 

2.11 

4.22 

2.92 

4 

2.11 

4.22 

8.44 

5.77 

J 

6.33 

Highpass 

22.5 

20.25 

‘Frequencies  are  weighted  according  to  their  squared  ampli¬ 
tude  (power)  in  computing  the  mean. 


fields  were  defined  on  a  128  x  128  array  by 
choosing  independent  Gaussian  noise  samples 
for  each  pixel,  with  the  mean  equal  to  zero  and 
a  variance  as  required  by  the  condition.  (As 
with  the  letters,  only  the  central  90  x  90  pixels 
were  displayed.)  Forty  different  noise  fields  were 
created. 

Filters.  Each  stimulus  consisted  of  a  filtered 
letter  added  to  an  identically  filtered  noise  field. 
Six  spatial  filters  were  available,  corresponding 
to  six  successive  levels  of  a  Laplacian  pyramid 
(Burt  &  Adelson,  1983).  The  zero-frequency 
component  was  added  to  the  images  so  that  they 
could  be  viewed.  The  object-relative  filter 
characteristics,  upper  and  lower  half-amplitude 
cutoff  and  2D  mean  frequency  (cycles  per 
letter  height),  appear  in  Table  1.  The  2D  mean 
frequency  /  for  a  given  band  is: 

tn  \n  I  127  127 

/=!  lAX./Z  l^ir 

laO  /«<>  /  x-0  r-o 

where  f,,  is  the  2D  frequency  and  a.,,  is  its 
amplitude.  Cycles  per  object  height  is  used 
rather  than  the  more  usual  cycles  per  object 
width  because  the  height  of  our  upper-case 
letters  remained  constant  across  the  entire  set, 
whereas  the  width  varied  between  letters. 

The  transfer  functions  (spectra)  of  the  filters 
are  displayed  in  Fig.  1.  Approximately,  filters 
are  separated  in  spatial  frequency  by  an  octave 
(factor  of  2)  and  have  a  bandwidth  at  half¬ 
amplitude  of  two  octaves.  The  small  mound  in 
the  lower  right  comer  of  Fig.  1  is  a  negligible 
imperfection  in  filter  4.  For  convenience,  the 
limited  range  of  spatial  frequencies  passed  by 
each  of  the  filters  will  be  referred  to  as  the  band 
of  that  filter;  a  specific  band  is  b,  (i  =  0, 1,  2,  3, 
4,  5),  where  bo  is  the  lowest  set  of  frequencies 
and  ^5  is  the  highest. 

The  filter  spectra  (shown  in  Fig.  1)  are 
approximately  symmetrical  in  log  frequency 
coordinates,  a  symmetrical  spectrum  in  log  co¬ 
ordinates  is  highly  skewed  to  the  right  in  linear 
frequency  coordinates,  resulting  in  a  mean  that 
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Fig.  I.  Filter  charactehstics  for  the  filters  used  in  the 
experiments.  There  are  two  abscissas,  both  on  a  log  scale. 
The  top  abscissa  is  the  frequency  in  cycles  per  unwindowed 
field  width  (128  pixels);  die  bottom  abscissa  is  in  cycles  per 
letter  height  (4S  pixels).  The  ordinate  is  the  normalized  gain. 
The  parameter  i  indicates  the  filter  designation  in  the  text. 


is  much  greater  than  the  mode.  In  a  2D  (vs  ID) 
filter,  the  rightward  shift  is  accentuated.  For 
example,  band  2  has  a  peak  frequency  of  l.OS 
c/object  but  a  2D  mean  frequency  of  1.49 
c/object.  The  single  most  informative  character¬ 
ization  of  such  a  skewed  bandpass  spectrum 
depends  somewhat  on  the  context;  usually  use 
the  mean  rather  than  the  peak. 

Figure  2  (top)  shows  the  letter  G,  filtered  in 
bands  1-5  without  noise;  the  bottom  shows  the 
same  signals  plus  noise,  s/n  =  0.5.  The  full 
128  X  128  array  (extended  by  reflection  beyond 
its  edges)  was  passed  through  the  filter  so  that 
the  effect  of  the  picture  boundary  did  not 
intrude  into  the  critical  part  of  the  display. 

Signal  to  noise  ratio,  s/n.  A  filtered  letter  is  a 
signal.  Let  i,j  index  a  particular  pixel  in  the  x.  y 
coordinate  space  of  the  stimulus.  The  signal 
contrast  c,(i,J)  of  pixel  i,j  is: 


c,(«,;)  = 


(l(i,J)-lo) 


(1) 


where  If^j  is  the  luminance  of  pixel  i,  j  and  lo  is 
the  mean  signal  luminance  over  the  90  x  90 
array.  Signal  power  per  pixel,  s,  is  defined  as 
mean  contrast  power  averaged  over  the  90  x  90 
pixel  array: 

s^UJ)-'itc.(i,jY  (2) 

I  j 

where  C|.^  is  the  contrast  of  pixel  i,  j  and 
/-y  =  90. 

Noise  contrast  0,(1,})  is  the  value  of  the  i,yth 
noise  sample  divided  by  the  mean  luminance. 
Analogously  to  signal  power  (equation  2),  noise 
contrast  power  per  pixel,  n,  is  equal  to  (a/lof. 
The  signal  to  noise  ratio  is  simply  s/n. 
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Quantization.  Otir  display  system  produced 
2S6- discrete  luminance  levels.  Level  128  was 
used  as  the  mean  luminance  /o;  4  was 
47.Scd/m^  To  produce  a  visual  display  of  a 
given  letter,  band,  and  r/n,  signal  power  s  and 
noise  power  n  were  normalized  so  that  the 
luminance  of  every  one  of  the  8100  displayed 
pixels  fell  within  the  range  of  the  display  system; 
there  was  no  truncation  of  the  tails  of  the 
Gaussian  noise.  (Although  the  relationship  be¬ 
tween  input  gray-level  and  output  luminance 
was  not  quite  linear  at  the  extreme  intensity 
values,  it  was  determined  that  more  than  90% 
of  the  pixels  fell  within  the  linear  intensity 
range.)  Intensity  normalization  was  applied  sep¬ 
arately  to  each  stimulus  (combination  of  signal 
plus  noise).  By  normalizing  the  total  stimulus 
s  +n,  the  actual  value  of  s  displayed  to  the 
subject  diminished  ^  n  increased;  i.e.  the  actual 
value  of  s  was  not  known  by  the  subject.  Indeed, 
even  stimuli  with  precisely  the  same  letter  in  the 
same  band  and  with  the  same  5 /n  might  be 
produced  with  slightly  different  s  and  n  depend¬ 
ing  on  the  extreme  values  of  the  noise  fields. 

Seven  values  of  s/n  were  available  for  each 
band,  chosen  in  a  pilot  study  to  insure  that  the 
data  yielded  the  entire  psychometric  function 
(chance  to  best  performance).  The  same  pilot 
study  showed  that  subjects  never  performed 
above  chance  when  confronted  with  noise-free 
letters  from  b(,\  this  band  was  omitted  from  the 
present  study. 

Procedure:  experiment  1 

Four  of  the  experimental  variables — letter 
identity,  noise  field,  frequency  band,  and  s/n — 
were  randomized  within  each  session.  A  fifth 
variable,  viewing  distance,  was  held  constant 
within  each  session  and  was  varied  between 
sessions.  Four  viewing  distances  were  used: 
0.121,  0.38,  1.21  and  3.84  m.  A  chin  rest  was 
used  to  stabilize  the  subject’s  head  for  viewing 
at  the  shortest  distance.  At  the  four  distances, 
the  90  X  90  pixel  stimulus  subtended  31.6,  10. 
3.16  and  1 .0  deg  of  visual  angle  respectively.  The 


upper  and  lower  half-amplitude  cut-off  retinal 
frequencies  for  the  upper  six  filters,  with  respect 
to  the  four  viewing  disUnces  used  in  this  exper¬ 
iment,  and  for  a  fifth  distance  used  in  the  second 
experiment,  appear  in  Table  2.  Subjects  partici¬ 
pated  in  four  1-hr  sessions  at  each  viewing 
distance.  Each  session  consisted  of  31S  trials, 
nine  trials  at  each  of  seven  s/n's  for  each  of  the 
five  frequency  bands. 

Prior  to  the  first  session,  subjects  were  shown 
noise-free  examples  of  the  unfiltered  letters. 
They  were  told  that  each  stimulus  presentation 
consisted  of  a  letter  and  a  certain  amount  of 
noise,  and  that  the  letter  may  appear  degraded 
in  some  way.  They  were  informed  that  at  no 
time  would  a  letter  be  shifted  in  orientation  or 
from  its  central  location  in  the  stimulus  field. 
Finally,  they  were  instructed  to  view  each  stimu¬ 
lus  for  as  long  as  they  desired  before  making 
their  best  guess  as  to  which  letter  had  been 
presented.  A  response  Getter  identity)  was 
required  on  every  trial.  Subjects  typed  the 
response  on  a  keyboard  connected  to  the  host 
computer  (Vax  11/750);  subsequently,  typing  a 
carriage  return  erased  the  video  screen  and 
initiated  the  next  trial  in  a  few  seconds.  The 
room  illumination  was  very  dim;  the  response 
keyboard  was  lighted  by  stray  light  from  its 
associated  CRT  terminal.  No  feedback  was 
offered  to  the  subjects. 

Observers 

Three  subjects,  two  male  and  one  female, 
between  the  ages  of  20  and  27  participated  in  the 
experiment.  All  subjects  had  normal  or  cor- 
rected-to-normal  vision.  One  of  the  subjects  was 
a  paid  participant  in  the  study. 

Procedure:  experiment  2 

This  experiment  was  run  before  expt  1.  It  is 
reported  here  because  it  offers  additional  data 
with  two  new  and  one  old  subject  at  a  fifth 
viewing  distance.  Except  as  noted,  the  pro¬ 
cedures  are  similar  to  expt  I.  The  screen  was 
viewed  through  a  darkened  hood  at  a  distance 


Table  2.  Lower  and  upper  halt-power  frequency  and  20  mean  frequency  (in  c/deg  of  visual  angle)  for  all  bands  and  viewing 

distances  used  in  both  experiments 


Viewing  distance  (m) 


Band 

0.12 

0.38 

1.21 

3.84 

0.48 

0  Oowpass) 

1 

2 

3 

4 

5  (highpass) 

0.00-0.04(0.03) 

0.02-0.07(0.05) 

0.04-0.15(0.10) 

0.07-0.30(0.20) 

0.15-0.59(0.40) 

0.30-2.25(1.41) 

0.00-0.12(0.09) 

0.06-0.23(0.16) 

0.12-0.47(0.33) 

0.23-0.94(0.64) 

0.47-1.88(1.27) 

0.94-7.13(4.45) 

0.00-0.37(0.27) 

0.18-0.74(0.52) 

0.37-1.48(1.04) 

0.74-2.97(2.04) 

1.48-5.94(4.04) 

2.97-22.53(14.19) 

0.00-1.18(0.87) 

0.58-2.34(1.65) 

1.18-4.70(3.30) 

2.34-9.40(6.48) 

4.70-18.80(12.82) 

9.40-71.27(45.00) 

0.00-0.15(0.11) 
0.07-0.29(0.21) 
0.15-0.59(0.41) 
0.29-1.18(0.81) 
0.59-2.36(1.60) 
1.77-8.%  (5.63) 
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of  0.48  m.  At  this  distance,  the  90  x  90  stimuli 
subtended  7. 15  deg  of  visual  angle.  The  half¬ 
amplitude  cut-off  frequencies  and  the  mean 
frequencies  of  the  six  spatial  filters  are  given  in 
the  rightmost  column  of  Table  2.  Three  male 
subjects  between  the  ages  of  20  and  27  par¬ 
ticipated  in  the  experiment.  All  subjects  had 
normal  or  corrected-to-normal  vision.  Two  of 
the  subjects  were  paid  for  their  participation, 
and  one,  DHP,  also  participated  in  expt  1.  Five 
sessions  of  315  trials  were  run  for  each  subject. 

RESULTS 

Psychometric  functions:  p  vs  log,o  s/n 

The  measure  of  performance  is  the  observed 
probability  p  of  a  correct  letter  identification. 


The  complete  psychometric  functions  are  dis¬ 
played  in  Figs  3  (expt  1)  and  4  (expt  2).  A 
separate  psychometric  function  is  shown  for 
each  subject,  viewing  disunce  and  frequency 
band.  In  band  b| ,  for  all  subjects,  performance 
asymptotes  (for  noiseless  stimuli)  at  /  «  0.5.  In 
all  other  bands,  performance  improves  from 
near-chance  (1/26)  to  near  perfect  as  the  value 
of  s/n  increases. 

Noise  resistance  as  a  function  of  frequency  band 

An  obvious  aspect  of  the  data  of  both  exper¬ 
iments  is  that  the  data  move  to  the  left  of  the 
figure  panels  as  band  spatial  frequency  in¬ 
creases.  This  means  that  high  spatial  frequency 
stimuli  (bands  64,  65)  are  identifiable  at  smaller 


Fig.  3.  Psychometric  functions  from  expt  I.  Each  graph  displays  performance  as  a  function  of  tog,,  s/n, 
within  a  frequency  band.  The  parameter  is  viewing  distance.  Subjects  are  arranged  in  columns  and 
frequency  band  is  arranged  in  rows,  progressing  from  the  highest  frequency  band  at  the  top  to  the  lowest 
band  at  the  bottom.  The  four  viewing  disUnoes  are  3.M  (O).  1-21  (A).  0.38  (□),  and  0.121  (0)  m. 
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quency  band  in  expt  2.  Viewing  distance  was  0.48  m.  The 
five  frequency  bands,  are  indicated,  respectively,  by 
O.  □.  0  4nd  + .  The  probability  of  a  correct  response 

is  plotted  as  a  function  of  log|«  s/n. 


s/n  than  stimuli  in  bands  b|  and  dj;  resistance  to 
noise  increases  with  spatial  frequency  band.  To 
enable  comparisons  of  noise  ser^tivity  as  a 
function  of  brnd,  the  s/n  at  which  p  =  50%  was 
estimated  for  each  subject  and  frequency  band 
from  expt  I  by  means  of  inverse  interpolation 
from  the  best  fitting  logistic  function.  As  view¬ 
ing  distance  had  no  effect,  all  estimates  were 
made  using  the  data  collected  when  viewing 
distance  was  equal  to  0.38  m.  A  graph  of  these 
(s/n)x%  points  as  a  function  of  the  mean  object 
frequency  of  the  band  is  plotted  in  Fig.  5  (O)- 
For  comparison,  the  expected  rate  of  improve¬ 
ment  in  is/n)so%,  based  on  the  increasing  num¬ 
ber  of  frequency  components  as  one  moves  from 
low  to  high  frequency  bands,  is  plotted  as  a 
series  of  parallel  lines  in  Fig.  S.  Performance 
improves  [(sfn)yni  decreases]  somewhat  faster 
than  1//  (the  slope  of  the  parallel  lines).  These 
results,  and  Fig.  5,  will  be  analyzed  in  detail  in 
the  Discussion  section. 


0J2  oje  ixn  1.7«  S.K  9.«2  KU)  17J8  31« 


20  Mm  fraqutncy  (cycUt/Uttar  hai«pt) 

Fig.  S.  PerfonntDoe  of  human  subjecta  and  various  compu¬ 
tational  discriminaton.  The  abacisM  indicates  logig  of  the 
mean  frequency  of  each  bandpass  stimulus.  The  ordinate 
indicates  the  (interpolated)  s/n  ratio  at  which  a  probability 
of  a  correct  response  p  «  O.S  is  achieved.  Circles  indicate 
each  of  the  three  subjecu  in  expt  1  at  the  intermediate 
viewing  distance  of  1.21m.  In  band  fi,,  2  of  3  human 
subjects  fail  to  achieve  30*4  correct  (eff  •  0);  thcM  poinu  lie 
outside  the  graph.  (A)  indicates  sub-ideal  and  (0)  indicates 
super-ideal  performances  of  discriminaton  that  brackeu  the 
ideal  discriminator.  The  shaded  area  below  the  super-ideal 
discriminator  indicates  theoretically  unachievable  perform¬ 
ance.  Squares  indicate  performance  of  a  spatial  correlator- 
discriminator.  The  oblique  parallel  lines  have  slope  —  1  that 
represents  the  improvement  in  expected  performance 
(decrease  in  s/n)  as  function  of  the  number  of  frequency 
components  in  each  band  when  filter  bandwidth  is 
proportional  to  frequency. 

The  non -effect  of  viewing  distance 

Another  property  of  the  data  is  that,  in  most 
conditions,  viewing  distance  has  no  effect  on 
performance.  Analysis  of  variance,  carried  out 
individually  for  each  subject,  shows  that  there  is 
no  significant  effect  of  distance  in  any  band  for 
subject  dhp  and  a  significant  effect  of  distance  in 
bands  frg  and  6,  for  the  other  two  subjects. 
Further  analysis  by  a  Tukey  test  (Winer,  1971) 
in  bands  b’^  and  65  for  these  subjects  shows  that 
the  only  significant  effect  of  ^sunce  is  that 
visibility  at  the  longest  viewing  distance  is  better 
than  at  the  other  three  distances.  For  subject 
CJD,  the  improvement  is  equivalent  to  a  gain  in 
s/n  of  0.19  and  0.28  log,o  (for  bands  64  and  65, 
respectively);  for  MAV,  the  corresponding  gains 
were  0.21  and  0.40. 

Improved  performance  at  long  viewing  dis¬ 
tances  is  almost  certainly  due  to  the  square 
configuration  of  individual  pixels,  which  pro¬ 
duces  a  high  frequency  spatial  pixel  noise  that  is 
attenuated  by  viewing  from  sufficiently  far  away 
(Harmon  tc.  Julesz,  1973).  In  low  frequency 
bands,  pixel-boundary  noise  is  not  a  problem 
because  the  spatial  filtering  insures  that  adjacent 
pixels  vary  only  slightly  in  intensity.  We  ex¬ 
plored  the  hypothesis  of  pixel-boundary  noise 
with  subject  CJD,  who  showed  a  distance  effect 
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in  band  S.  At  an  intermediate  viewing  distance 
of  1.21  m,  CJD  squinted  her  eyes  while  viewing 
stimuli  from  band  S.  By  blurring  the  retinal 
image  of  the  display  in  this  way,  performance 
improved  approximately  to  the  level  of  the 
furthest  viewing  distance. 

To  summarize,  the  only  significant  effect  of 
distance  that  we  observed  was  a  lowering  of 
performance  at  near  viewing  distances  relative 
to  the  furthest  distance.  This  impairment 
occurred  primarily  in  bands  4  and  5.  In  these 
bands,  the  spatial  quantization  of  the  display 
(90  X  90  square-shaped  pixels)  produces  arti- 
factual  high  spatial  frequencies  that  mask 
the  target.  These  artifactually  produced  spatial 
frequencies  can  be  attenuated  by  deliberate 
blurring  (squinting),  or  by  producing  displays 
with  higher  spatial  resolution,  or  by  increasing 
the  viewing  distance  to  the  point  where  the  pixel 
boundaries  are  attenuated  by  the  optics  of  the 
eye  and  neural  components  of  the  visual  modu¬ 
lation  transfer  function.  In  all  cases,  blurring 
improves  performance  and  eliminates  the 
slightly  deleterious  effect  of  a  too  small  viewing 
distance.  Thus,  for  correctly  constructed  stim¬ 
uli,  in  the  frequency  ranges  studied,  there  would 
be  no  significant  effect  of  viewing  distance  on 
performance.  This  finding  is  in  agreement  with 
the  results  of  Legge  et  al.  (198S),  who  examined 
reading  rate  rather  than  letter  recognition.  It  is 
in  stark  disagreement  with  the  results  of 
sinewave  detection  experiments  in  which  retinal 
frequency  is  critical — see  Sperling  (1989)  for  an 
explanation. 

DISCUSSION 

A  comparison  of  performance  in  different 
frequency  bands  shows  that  subjects  perform 
better  the  higher  the  frequency  band;  and  sub¬ 
jects  require  the  smallest  signal-to-noise  ratio 
in  the  highest  frequency  band.  To  determine 
whether  performance  in  high  frequency  bands  is 
good  because  humans  are  more  efficient  in 
utilizing  high-frequency  information,  or  because 
there  is  objectively  more  information  in  the 
high-frequency  images,  or  both,  requires  an 
investigation  of  the  performance  of  an  ideal 
observer.  The  performance  of  the  ideal  observer 
is  the  measure  of  the  objective  presence  of 
information.  Human  performance  results  from 
the  joint  effect  of  the  objective  presence  of 
information  and  the  ability  of  humans  to  utilize 
that  information.  Human  efficiency  is  the  ratio 
of  human  performance  to  ideal  performance. 


Ideal  discriminator 

Definition.  An  ideal  discriminator  makes  the 
best  possible  decision  given  the  available  data 
and  the  interpretation  of  “best.”  The  perform¬ 
ance  of  the  ideal  discriminator  defines  the  objec¬ 
tive  utility  of  the  information  in  the  stimulus. 
We  prefer  the  name  ideal  discriminator,  rather 
than  ideal  observer,  because  it  indicates  the 
critical  aspect  of  performance  under  consider¬ 
ation,  but  we  occasionally  use  ideal  observer  to 
emphasize  the  relations  to  a  large,  relevant 
literature  on  this  subject.  Our  purposes  in  this 
section  are  first,  to  derive  an  ideal  discriminator 
for  the  letter  identification  task,  second,  to 
develop  a  practical  working  approximation  to 
this  discriminator,  and  third,  to  compare  the 
performance  of  the  human  with  the  ideal  dis¬ 
criminator. 

Although  ideal  observers  have  recently  come 
into  greater  use  in  vision  research,  the  appli¬ 
cations  have  focused  primarily  on  determining 
the  limits  of  performance  for  relatively  low-level 
visual  phenomena.  For  example,  Barlow  (1978, 
1980),  and  Barlow  and  Reeves  (1979)  investi¬ 
gated  the  perception  of  density  and  of  mirror 
symmetry;  Geisler  (1984)  investigated  the  limits 
of  acuity  and  hyperacuity;  Legge,  Kersten  and 
Burgess  (1987)  examined  the  pedestal  effect; 
Kersten  (1984)  studied  the  detection  of  noise 
patterns;  and  Pelli  (1981)  detailed  the  roles  of 
internal  visual  noise.  Geisler  (1989)  provides  an 
overview  of  efficiency  computations  in  early 
vision.  Our  application  differs  from  these  in  that 
we  expand  the  techniques  and  apply  them  to 
a  higher  perceptual/cognitive  function,  letter 
recognition. 

For  the  letter  identification  task,  the  ideal 
discriminator  is  conceptually  easy  to  define.  A 
particular  observed  stimulus,  x,  representing  an 
unknown  letter  plus  noise,  consists  of  an  inten¬ 
sity  value  (one  of  256  possible  values)  at  each  of 
90  X  90  locations.  The  discriminator’s  task  is  to 
make  the  correct  choice  as  frequently  as  possible 
from  among  the  26  alternative  letters. 

The  likelihood  of  observing  stimulus  x,  given 
each  of  the  26  possible  signal  alternatives,  can 
be  computed  when  the  probability  density  func¬ 
tion  of  the  added  noise  is  known  exactly.  The 
optimal  decision  chooses  the  letter  that  has  the 
highest  likelihood  of  yielding  x.  The  expected 
performance  of  the  ideal  discriminator  is  com¬ 
puted  by  summing  its  probability  of  a  correct 
response  over  the  256*'*  possible  stimuli  (256 
gray  levels,  90  x  90  pixels).  Unfortunately, 
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Fig.  .6.  Flow  chart  of  the  experimental  procedures  that  are  modelled  by  the  ideal  discriminator  analysis. 
Upper  half  indicates  space-domain  operations;  lower  half  indicates  the  corresponding  operations  in  the 
frequency  domain.  Compuutions  are  carried  out  on  128  x  128  arrays;  the  subject  sees  only  the  center 
90  X  90  pixels.  A  random  letter  and  a  random  noise  field  ate  each  filtered  by  the  same  filter  (b);  the  noise 
is  amplified  to  provide  the  desired  signal-to-noise  ratio;  the  letter  and  noise  ate  added,  the  output  is  scaled 
and  quantized  (represented  by  the  addition  of  digitization  noise),  and  the  result  is  shown  to  the  subject. 
In  the  frequency  domain  <»,,  <u,,  the  bandpass  filter  selects  an  annulus,  whereas  the  quantization  noise 

is  uniform  over  to,,  to,. 


when  there  is  both  bandpass  filtered  and  inten¬ 
sity  quantization,  the  usual  simplifications  that 
make  this  enormous  computation  tractable  are 
not  applicable. 

As  an  alternative  to  computing  the  expected 
performance  of  the  ideal  discriminator,  one  can 
compute  its  performance  with  a  particular  sub¬ 
set  of  the  possible  stimuli — the  stimuli  that,  the 
subject  actually  viewed  or,  preferably,  a  larger 
set  of  stimuli  for  more  reliable  estimation.  This 
Monte  Carlo  simulation  of  the  performance 
of  the  ideal  discriminator  is  a  tractable  com¬ 
putation  that  yields  an  estimate  of  expected 
performance. 

Derivation.  Stimulus  construction  is  dia- 
granuned  in  Fig.  6  which  shows  the  equivalent 
operations  in  the  space  and  the  frequency  do¬ 
mains.  To  derive  an  ideal  discriminator,  we  need 
to  carefully  review  the  processes  of  stimulus 
construction.  We  use  uppercase  letters  to  rep¬ 
resent  quantities  in  the  frequency  domain  and 
lowercase  letters  to  represent  quantities  in  the 
space  domain.  A  letter  is  defined  by  a  90  x  90 
array  that  takes  the  value  I  at  the  letter 
locations  and  0  at  the  background  locations. 
When  this  array  is  spatially  filtered  in  band  b,  it 
defines  the  letter  template  t^^(x,y),  where  / 


indicates  the  particttlar  letter,  b  the  frequency 
band,  and  x,y  the  pixel  location.  We  write 
(o,)  for  the  Fourier  series  coefficient  of 
t^^  index^  by  frequency. 

An  unknown  stimulus  u^»(x,y)  to  be  viewed 
by  a  subject  is  produced  by  adding  filtered 
nt(x,y)  with  post-filtering  variance  oh,  to  the 
template  where  letter  identity  f  is  un¬ 

known  to  the  subject.  The  stimulus  is  s^ed  and 
digitized  (quantized)  to  256  levels  prior  to  pres¬ 
entation,  contributing  an  additional  source  of 
noise  q^i(x,  y),  called  digitization  noise.  Finally, 
a  d.c.  component  (tie)  is  added  to  to  bring 
the  mean  luminance  level  to  128.  These  steps  are 
diagrammed  in  Fig.  6  which  shows  boffi  the 
space-domain  and  the  corresponding  frequency- 
domain  operations.  The  space-domain  compu¬ 
tation  is  encapsulated  in  equations  (3): 

y)  *  fii.ilti.tix,  y)  +  fi*(x,  y)]  (3a) 

-  fii.tlti.tix,y)  +  n*(x,y)] 

+  9i*(*,y)  +  <fc.  (3b) 

The  scaling  constant  fi,t,  limits  the  range  of 
real  values  for  each  pixels  prior  to  quantization, 
to  (-0.S,  2SS.S].  The  degree  of  scaling  is  deter¬ 
mined  by  the  maximum  and  minimum  values  in 
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the  function  +  Note  that  the  extreme 
values  in  the  image  are  determined  by  which 
is  adjusted  to  yield  the  appropriate  s/n  for  each 
condition;  the  values  of  »  are  fixed  prior  to 
scaling.  Specifically: 

g _ 256 

'•* "  max(»,,*  +  /!»)  -  min(f^4  +  n^)' 

As  a  result  of  bandpass  filtering,  the 
noise  samples  in  adjacent  pixels  are  strongly 
dependent  on  each  other.  Therefore,  the  dis¬ 
criminator  problem  is  best  approached  in  the 
Fourier  domain,  where  the  random  variables 
are  jointly  independent  because 
the  filtering  operations  simply  scale  the  differ¬ 
ent  frequency  components  without  intro¬ 
ducing  any  correlations  (van  Tress,  1968).  The 
task  of  the  ideal  discriminator  is  to  pick  the 
template  that  maximizes  the  likelihood  of  u,  ^ 
with  a  priori  knowledge  of:  (i)  the  fixed  func¬ 
tions  ti  t,  and  their  probabilities;  and  (ii)  the 
densities  of  the  jointly  independent  random 
variables  o),)}.  As  is  clear,  fiij,,  a%, 

«,)}.  and  are  all  jointly 

distributed  random  variables  characterized  by 
some  density  /.To  compute  the  likelihood  of 
the  ideal  discriminator  must  integrate  /  over  all 
possible  values  that  may  be  assumed  by  the 
set  of  jointly  distributed  random  variables, 
whose  values  are  constrained  only  in  that  they 
result  in  a  possible  stimulus  Unfortunately, 
no  closed-form  solution  to  this  problem  is  avail¬ 
able,  forcing  us  to  look  for  an  alternative 
approach. 

Bracketing.  To  estimate  the  performance  of 
the  ideal  discriminator,  we  look  for  a  tractable 
super-ideal  discriminator  that  is  better  than  the 
ideal  but  which  is  solvable.  Similarly,  we  look 
for  a  tractable  sub-ideal  discriminator  that  is 
worse  than  the  ideal.  The  ideal  discriminator 
must  lie  between  these  two  discriminators;  that 
is,  we  bracket  its  performance  between  that  of 
a  “super-ideal”  and  a  “sub-ideal”  discriminator. 
The  more  similar  the  performance  of  the  super- 
and  sub-ideal  discriminators,  the  more  con¬ 
strained  is  the  ideal  performance  which  lies 
between  them. 

Our  super-ideal  discriminator  is  told,  a  priori, 
the  extact  values  for  &nd  eh  for  each  stimu¬ 
lus  presentation.  Therefore,  it  is  expected  to 
perform  slightly  better  than  the  ideal  discrimi¬ 
nator  which  must  estimate  these  values  from 
the  data.  The  sub-ideal  discriminator  estimates 
these  same  parameters  from  the  presented 
stimulus  in  a  simple  but  nonideal  way.  There¬ 


fore,  it  is  expected  to  perform  slightly  worse 
than  the  ideal  discriminator.  The  computational 
forms  used  to  compute  Pi  t  and  eh  for  the 
sub-ideal  discriminator  are  presented  in  the 
Appendix,  along  with  the  derivation  of  the 
likelihood  estimator  used  by  both  discrimin- 
aton.  A  complete  discussion  of  these  deri¬ 
vations  and  the  problems  associated  with  the 
formulation  of  an  ideal  discriminator  for  such 
complex  stimuli  is  presented  in  Chubb,  Sperling 
and  Parish  (1987). 

Performance  of  the  bracketed  discriminator. 
The  super-  and  sub-ideal  discriminators  were 
tested  in  a  Monte  Carlo  series  of  trials,  in  which 
they  each  were  confronted  with  90  stimuli  in 
each  of  the  frequency  bands  at  each  of  seven  s/n 
values  chosen  to  best  estimate  their  SOVo  per¬ 
formance  point.  The  s/n  necessary  for  50*/# 
correct  discriminations  was  estimated  by  an 
inverse  interpolation  of  the  best  fitting  logistic 
function.  The  derived  (s/n)jo%  the  measure 
of  performance  of  a  discriminator.  The  mean 
ratio,  across  frequency  bands,  of 

(s/n )jo.4  sub-idcal/(s/n)jo%  super-ideal 

is  about  2  (approx.  0.3  log,o  units).  The 
ratio  does  not  depend  on  the  criterion  of 
performance. 

Efficiency  of  human  discrimination 

In  all  conditions,  human  subjects  perform 
worse  than  the  sub-ideal  discriminator.  Notably, 
with  no  added  luminance  noise,  the  subideal 
(and,  of  course,  the  ideal)  discriminator  func¬ 
tion  perfectly,  even  in  6#  where  subject  perform¬ 
ance  is  at  chance,  and  in  6|  where  subjects 
reached  asymptote  at  about  50%  correct. 

Data  from  the  subjects  are  plotted  with  the 
(•’/'>)»%  sub-ideal  and  {.sln)yn,  super-ideal  in 
Fig.  5.  For  comparison.  Fig.  5  also  shows  the 
performance  of  a  correlator  discriminator  which 
chooses  the  letter  template  that  correlates  most 
highly  with  the  stimulus  in  the  space  domain.  In 
the  coordinates  of  Fig.  5  (log,os//i  vs  log,#/ 
where  /  represents  the  mean  2D  spatial  fre¬ 
quency  of  the  band),  the  vertical  disUnce  d  from 
^e  human  data  \og{,sln)tt%,  human  down  to  the 
bracketed  discriminator  \og(sln)x%,  idea!  rep¬ 
resents  the  log,o  of  the  factor  by  which  the 
bracketed  discriminator  outperforms  the  human 
observer  at  that  value  of  /.  For  the  purpose 
of  specifying  efficiency,  we  assume  the  ideal 
discriminator  lies  at  the  mid-point  of  the  sub 
and  super-ideal  discriminators  in  Fig.  5.  The 
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Fig.  7.  Discrimination  efficiency  as  a  function  of  the  mean 
frequency  of  a  2-octave  band  (in  cycles  per  letter  height) 
indicated  on  a  logarithmic  scale.  Data  are  shown  for  three 
observers:  A  “  SAW,  □  •«  RS,  O  •  DHP.  The  viewing 
distance  is  2.21  m,  which  is  representative  of  all  viewing 
distances  tested. 

efficiency  eff  of  human  discrimination  relative 
to  the  bracketed  discriminator  is  eff  =  10"“, 
where: 

d  =  log(j/n )»%.*,«„  -  log(j/n)jov..i*^- 

The  values  of  eff  in  each  object  frequency 
band  are  shown  in  Fig.  7.  In  band  0,  eff  is  zero 
because  human  performance  never  reaches 
50%;  indeed,  it  never  rises  significantly  above 
4%  (chance).  In  band  1,  human  performance 
asymptotically  climbs  close  to  50%  as  sjn  ap¬ 
proaches  infinity;  eff  «  0.  In  band  2,  eff  reaches 
its  maximum  of  35-47%  (depending  on  the 
subject),  and  it  declines  rapidly  with  increasing 
frequency  {by-bi). 

TTie  42%  average  efficiency  in  band  2  is 
similar  in  magnitude  to  the  highest  efficiencies 
observed  in  comparable  studies.  For  example, 
efficiency  has  been  determined  for  detecting 
various  kinds  of  patterns  in  arrays  of  random 
dots  (Barlow,  1978,  1980;  van  Meeteren  & 
Barlow,  1981),  tasks  which,  like  ours,  may 
require  significantly  cognitive  processing.  In  a 
wide  range  of  conditions,  the  highest  efficiencies 
observed  were  about  50%,  and  frequently 
lower.  Van  Meeteren  and  Barlow  (1981)  also 
found  that  efficiency  was  perfectly  correlated 
with  object  spatial  frequency  and  was  indepen¬ 
dent  of  retinal  spatial  frequency. 

Spatial  correlator  discriminator.  A  correlator 
discriminator  cross-correlates  the  presented 
stimulus  with  its  memory  templates  and  chooses 
the  template  with  the  highest  correlation.  Corre¬ 
lation  can  be  carried  out  in  the  space  or  in  the 
frequency  domain.  Correlation  is  an  efficient 
strategy  when  noise  in  adjacent  pixels  is  inde¬ 
pendent  and  when  members  of  the  set  of  signals 
have  the  same  energy;  both  of  these  conditions 


are  violated  by  our  stimuli.  However,  when 
sufficient  prior  information  is  available  to  sub¬ 
jects,  they  do  appear  to  employ  a  cross-corre¬ 
lation  strategy  (Burgess,  1985). 

It  is  interesting  to  note  that  the  performance 
of  the  spatial  correlator  discriminator  over  the 
middle  range  of  spatial  frequencies  is  quite  close 
to  the  performance  of  the  sub-ideal  discrimin¬ 
ator.  At  high  spatial  frequencies,  correlator 
performance  degenerates,  due  to  iu  inability  to 
focus  spatially  on  those  pixel  locations  that 
contain  the  most  information.  A  spatial  corre¬ 
lator  that  optimally  weighted  spatial  locations, 
could  overcome  the  spatial  focusing  problem  at 
high  frequencies.  (Spatial  focusing  is  treated  in 
the  next  section.) 

At  all  frequencies,  the  spatial  correlator  is 
nonideal  because  noise  at  spatial  adjacent  pixels 
is  not  independent.  At  low  spatial  frequencies, 
the  nonindependence  of  adjacent  locations  be¬ 
comes  extreme  and  the  correlator  fails  miser¬ 
ably.  This  points  out  that,  for  our  stimuli, 
correlation  detection  is  better  carried  out  in  the 
frequency  domain  because  there  the  noise  at 
different  frequencies  is  independent.  The  quali¬ 
tative  similarity  between  the  correlator  dis¬ 
criminator  and  the  subjects’  data  suggests  that 
the  subjects  might  be  employing  a  spatial 
correlation  strategy,  augmented  by  location 
weighting  at  high  frequencies. 

Lowest  spatial  frequencies  sufficient  for  letter 
discrimination.  Band  2  corresponds  to  a  2- 
octave  band  with  a  peak  frequency  of  1.05 
c/object  (vertical  height  of  letters)  and  a  2D 
mean  frequency  of  1.49  c/object.  At  the  four 
viewing  distances,  1.05  c/object  corresponds  to 
retinal  frequencies  of  0.074,  0.234,  0.739  and 
2.34  c/deg  of  visual  angle.  We  observe  perfect 
scale  invariance:  all  of  these  retinal  frequencies, 
and  hence  the  visual  channels  that  process  this 
information,  are  equally  effective  in  achieving 
the  high  efficiency  of  discrimination. 

The  finding  that  bj  with  a  center  frequency  of 
1.05  c/object  and  a  |  amplitude  cutoff  at  2.1 
c/object  is  critical  for  letter  discrimination  is  in 
good  agreement  with  previous  findings  of  both 
Ginsburg  (1978)  for  letter  recognition  and 
Legge  et  al.  (1985)  for  reading  rate.  Legge  et  al. 
used  low-pass  filtered  stimuli,  which  included 
not  only  spatial  frequencies  within  an  octave  of 
I  c/object  (6])  but  also  included  all  lower  fre¬ 
quencies.  From  the  present  study,  we  expect 
human  performance  tyith  low-pass  and  with 
band-pass  spatial  filtering  to  be  quite  similar  up 
to  1  c/object  because  the  lowest  frequency 
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bands,  when  presented  in  isolation,  are  percep¬ 
tually  useless  (at  least  when  presented  alone). 

It  is  an  important  fact  that  our  subjects 
actually  performed  better,  in  the  sense  of  achiev¬ 
ing  criterion  performance  at  a  lower  j/n  ratio,  at 
higher  frequency  bands  than  bj.  This  is  ex¬ 
plained  by  the  increase  in  stimulus  information 
in  higher  frequency  stimuli.  Increased  infor¬ 
mation  more  than  compensates  for  the  subjects’ 
loss  in  efficiency  as  spatial  frequency  increases. 

Components  of  discrimination  performance 

Though  the  performance  of  the  bracketed 
ideal  discriminator  is  useful  in  quantifying  the 
informational  utility  of  the  various  bands,  it  is 
instructive  to  consider  the  changing  physical 
structure  of  the  stimuli  as  well.  What  com¬ 
ponents  of  the  stimuli  actually  lead  to  a  gain  in 
information  with  increasing  frequency?  Accord¬ 
ing  to  Shannon’s  theorem  (Shannon  &  Weaver, 
1949),  an  absolutely  bandlimited  1-D  signal  can 
be  represented  by  a  number  of  samples  m  that 
is  proportional  to  its  bandwidth.  When  the 
signal- to-noise  ratio  in  each  sample  sjnj  is  the 
same,  the  overall  signal- to-noise  ratio  s/n  grows 
as  Jm.  In  the  space  domain,  our  filters  were 
constructed  (approximately)  to  differ  only  in 
scale  but  not  in  the  shape  of  their  impulse 
responses.  Therefore,  when  the  mean  frequency 
of  a  filter  band  increased  by  a  factor  of  2,  the 
bandwidth  also  increased  by  2.  Since  the  stimuli 
are  2D,  the  effective  number  of  samples  in¬ 
creases  with  the  square  of  frequency,  and  the 
increase  in  efiective  s/n  ratio  is  proportional  to 
m.  This  expected  improvement  with  frequency, 
based  simply  on  the  increase  in  effective  number 
of  samples,  is  indicated  by  the  oblique  parallel 
lines  of  Fig.  5  with  slope  of  - 1.  The  expected 
improvement  in  threshold  s/n  due  simply  to  the 
linearly  increasing  bandwidth  of  the  bands  does 
a  reasonable  job  of  accounting  for  the  improve¬ 
ment  in  performance  for  both  human  and 
bracketed  discriminators  between  bi  and  bf. 

Performance  of  all  discriminators  improves 
faster  with  frequency  between  0.39  and  1.5 
c/object  and  between  S.8  and  22  c/object  than  is 
predicted  from  the  bandwidths  of  the  images.  A 
slope  steeper  than  —  1  means  that  there  is  more 
information  for  discriminating  letters  in  higher 
frequency  bands  even  when  the  number  of 
independent  samples  is  kept  the  same  in  each 
band.  Once  sampling  density  is  controlled,  just 
how  much  information  letters  happen  to  con¬ 
tain  in  each  frequency  band  is  an  ecological 
property  of  upper-case  letters. 


Increasing  spatial  localization  with  increasing 
frequency  band.  From  the  human  observer’s 
point  of  view,  the  letter  information  in  low-pass 
filtered  images  is  spread  out  over  a  large  portion 
of  the  total  image  array.  In  high  spatial-fre¬ 
quency  images,  the  letter  information  is  concen¬ 
trated  in  a  small  proportion  of  the  total  number 
of  pixels.  In  high  spatial-frequency  images,  a 
human  observer  who  knows  which  pixels  to 
attend  will  experience  an  effective  s/n  that  is 
higher  than  an  observer  who  attends  equally  to 
all  pixels.  In  this  respect,  humans  differ  from  an 
ideal  discriminator.  The  ideal  discriminator  has 
unlimited  memory  and  processing  resources, 
does  not  explicitly  incorporate  any  selective 
mechanism  into  its  decision,  and  uses  the  same 
algorithm  in  ail  frequency  bands.  Information 
from  irrelevant  pixels  is  enmeshed  in  the 
computation  but  cancels  out  perfectly  in  the 
letter-decision  process.  To  understand  human 
performance,  however,  it  is  useful  to  examine 
how,  with  our  size-scaled  spatial  filters,  letter 
information  comes  to  be  occupy  a  smaller  and 
smaller  fraction  of  the  image  array  as  spatial 
frequency  increases. 

Here  we  consider  three  formulations  of  the 
change  in  the  internal  structure  of  the  images 
with  increasing  spatial  frequency;  (1)  spatial 
localization;  (2)  correlation  between  signals;  and 
(3)  nearest  neighbor  analysis.  We  have  already 
noted  that,  in  our  images,  the  information-rich 
pixels  become  a  smaller  fraction  of  the  total 
pixels  as  frequency  band  increases.  Indeed,  this 
reduction  can  be  estimated  by  computing  the 
information  transmitted  at  any  particular  pixel 
location  or,  more  appropriately  for  estimating 
noise  resistance,  by  computing  the  variance  of 
intensity  (at  that  pixel  location)  over  the  set  of 
26  alternative  signals. 

To  demonstrate  the  degree  of  increasing 
localization  with  increasing  frequency,  the  vari¬ 
ance  (over  the  set  of  26  letter  templates)  was 
computed  at  each  pixel  location  (x,y).  Total 
power,  the  total  variance,  is  obtained  by  sum¬ 
ming  over  pixel  locations.  The  number  of  pixel 
locations  needed  to  achieve  a  specific  fraction  of 
the  total  power  is  given  in  Fig.  8,  with  frequency 
band  as  a  parameter.  These  curves  describe  the 
spatial  distribution  of  information  in  the  latter 
templates.  If  all  pixels  were  equally  informative, 
exactly  half  of  the  total  numter  of  pixels  would 
be  nc^ed  to  account  for  50%  of  the  total 
power.  The  solid  curves  in  Fig.  8  show  that  the 
number  of  pixels  needed  to  convey  any  percent¬ 
age  of  total  signal  power,  decreases  as  the 
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Fig.  8.  Fraction  of  total  power  contained  in  the  n  most 
extreme-valued  pixels  as  a  function  of  n  (out  of  8100).  Solid 
lines  indicate  the  power  fractions  for  signals:  the  curve 
parameter  indicates  the  filter  band.  Dashed  lines  indicate 
power  fractions  for  filtered  noise  fields.  Although  power 
fractions  from  successive  bands  of  noise  are  too  close  to 
label,  they  generally  fall  in  the  same  left-right  S-0  order  as 
those  for  signal  bands. 

frequency  band  increases.  These  information 
distribution  curves  are  an  ecological  property  of 
our  set  of  letter  stimuli;  different  curves  would 
be  needed  describe  other  stimulus,  sets. 

The  dashed  curves  in  Fig.  8  were  derived  from 
random  noise  filtered  in  each  of  the  six  fre¬ 
quency  bands  The  distribution  of  noise 

power  is  very  similar  between  the  various  bands, 
enormously  more  so  than  the  distribution  of 
signal  power.  For  our  letter  stimuli,  stimulus 
information  coalesces  to  a  smaller  number  of 
spatial  locations  as  spatial  frequency  increases. 

Correlation  between  signals.  A  more  abstract 
way  of  describing  the  change  of  information 
with  bandwidth  is  to  note  that  letters  become 
less  confusible  with  each  other  in  the  higher 
frequency  bands.  A  good  measure  of  confusibil- 
ity  is  the  average  pairwise  correlation  between 
the  26  letter  templates  in  each  frequency  band 
(Table  3).  The  average  correlation  between 
letter  templates  diminishes  from  0.94  in  band  0 
to  0.31  in  band  5.  In  a  band  in  which  templates 
have  a  pairwise  correlation  over  0.9,  the  over¬ 
whelming  amount  of  intensity  variation  (“infor¬ 
mation”)  is  useless  for  discrimination.  Small 
wonder  that  subjects  fail  completely  in  this 
band.  Overall,  performance  of  the  ideal  dis¬ 
criminator  and  of  observers  improves  as  the 
correlation  decreases,  but  there  is  no  obvious 
way  to  use  the  pairwise  correlation  between 
templates  to  predict  performance. 

Nearest  neighbors.  The  analysis  of  nearest 
neighbors  is  a  useful  technique  for  predicting 
accuracy  by  the  analysis  of  the  possible  causes 
of  errors.  We  can  regard  a  filtered  image  t,  of 
letter  /  as  a  vector  in  a  space  of  dimensionality 
8100  (90  X  90  pixels).  When  noise  is  added,  the 


Ttble  3.  Average  pairwite  coirclatioiu  and 
Bearcsl  neighbon  (EudidMa  duianoe  x  10*’) 


Band 

Comlations 

Nearest  neighbor 

0 

0.94 

0.01 

1 

0.91 

0.30 

2 

0.58 

1.2 

3 

0.38 

2.3 

4 

0.33 

3.1 

S 

0.31 

4.1 

possible  positions  of  r,  are  described  by  a  cloud 
whose  dimensions  are  determined  by  the  s/n 
ratio.  A  neighboring  letter  k  may  be  confus^ 
with  letter  i  when  the  cloud  around  t,  envelopes 
t^.  The  closer  the  neighbor,  the  greater  the 
opportunity  for  error.  Table  3  gives  the  average 
normalized  distance  to  the  nearest  neighbor  in 
each  of  the  bands.  The  increase  in  distance  to 
the  nearest  neighbor  reflects  the  improvement  in 
the  representation  of  signals  as  spatial  frequency 
increases. 

We  consider  possible  causes  of  lower 
efficiency  of  discrimination  in  bands  below  bj. 
The  letters  in  these  bands  have  high  pair-wise 
correlations  and  the  mean  band  frequency  is 
less  than  the  object  frequency.  This  means 
that  letters  differ  only  in  subtle  differences  of 
shading,  a  feature  that  we  usually  do  not  think 
of  as  shape.  Observers  would  need  to  be  able  to 
utilize  small  intensity  differences  to  distinguish 
between  letters.  To  eliminate  an  alternative  ex¬ 
planation  (the  smaller  number  of  frequency 
components  in  the  low-frequency  bands),  we 
conducted  an  informal  experiment  with  a  lower 
fundamental  frequency.  The  fundamental  fre¬ 
quency,  which  is  outside  the  band,  nevertheless 
determines  the  spacing  of  frequency  com¬ 
ponents  within  the  band.  Reducing  the  funda¬ 
mental  frequency  of  the  letter  by  one-half 
increases  the  number  of  frequency  components 
in  the  band  by  a  factor  of  4.  (A  256  x  256 
sampling  grid  was  used  rather  than  128  x  128.) 
These  4  x  more  highly  sampled  stimuli  were  not 
more  discriminable  than  the  original  stimuli. 
This  suggests  that  the  internal  letter  represen¬ 
tation  (template)  that  subjects  bring  with  them 
to  the  experiment  cannot  utilize  low-frequency 
information,  even  when  it  is  abundantly  avail¬ 
able.  Whether,  with  sufficient  training,  subjects 
could  learn  to  use  low  spatial  frequencies  to 
make  letter  discriminations  is  an  open  question. 

SUMMARY  AND  CONCLUSIONS 

1.  Visual  discrimination  of  letters  in  noise, 
spatially  filtered  in  2-octave  wide  bands,  is 
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independent  of  viewing  distance  (retinal  fre¬ 
quency)  but  improves  as  spatial  frequency 
increases. 

2.  The  improvement  in  performance  with 
increasing  spatial  frequency  results  mainly  from 
an  increase  in  the  objective  amount  of  infor¬ 
mation  transmitted  by  the  filters  with  increasing 
frequency  (because  ^ter  bandwidth  was  pro¬ 
portional  to  center  frequency)  which  is  mani¬ 
fested  as  objectively  less  confusible  stimuli  in  the 
higher  bands. 

3.  The  comparison  of  human  performance 
with  that  of  an  estimated  ideal  discriminator 
demonstrates  that  humans  achieve  optimal 
discrimination  (a  remarkable  42%  efiSciency) 
when  letters  are  defined  by  a  2-octave  band  of 
spatial  frequencies  centered  at  1  cycle  per  letter 
height  (mean  frequency  1.5  c/letter),  llus  high 
efficiency  of  discrimination  is  maintained  over  a 
32 : 1  range  of  viewing  distances. 

4.  Detection  efficiency  was  invariant  over  a 
range  of  retinal  spatial  frequencies  in  which  the 
contrast  threshold  for  detection  of  sine  gratings 
(the  modulation  transfer  function,  MTF)  varies 
enormously.  The  independence  of  detection  per¬ 
formance  and  retinal  size  held  for  ail  frequency 
bands. 

5.  A  part  of  the  loss  of  human  efficiency  in 
discrimination  as  spatial  frequency  exceeded  1 
c/object  height  may  have  been  due  to  the  sub¬ 
jects’  inability  to  identify,  to  selectively  attend, 
and  to  utilize  the  smaller  fraction  of  information- 
rich  pixels  in  the  higher  frequency  images. 

6.  Finally,  it  is  important  to  note  that 
without  the  comparison  to  the  ideal  observer, 
we  would  not  have  been  able  to  uirderstand  the 
components  of  human  performance  in  the 
different  frequency  bands. 
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APPENDIX 

Both  sub-ideal  and  super-ideal  discriminators  must  compute 
estimates  of  the  likelihood  that  the  stimulus  Ut,^  was  pro¬ 
duced  with  template  t^^  and  noise  n,,  where  k  is  the  letter 
used  to  generate  the  stimulus,  i  is  an  arbitrary  letter,  and  b 
indexes  spatial  frequency  band.  Let  x  be  an  index  on  the 
pixels  of  the  image:  1  <  x  <  8100,  for  the  90  x  90  images  of 
the  experiments. 

For  the  Monte  Carlo  simulations  of  the  super-ideal 
discriminator,  the  unknown  stimulus  parameters,  a,,  ^  *nd  aj, 
are  computed  during  stimulus  construction,  and  their  exact 
values  are  supplied  to  the  discriminator  a  priori.  The 
sub-ideal  discriminator,  however,  must  estimate  these  par¬ 
ameters  from  the  data  as  follows. 


Sub -Ideal  Parameter  Estimation 

Recall  that  stimulus  contrast  is  modulated  for  any  pixel 
X  in  the  image: 


-  A.»K»(Jc)  +  »*(*)J  +  (Al) 


The  scaling  constant  limits  range  of  real  values  for  each 
pixel,  prior  to  quantization,  to  the  open  interval  (—0.5, 
255.5);  the  addition  of  called  quantization  noise, 
rounds  off  pixel  values  to  integers. 

For  each  bandpass  filtered  template  t^^,  we  fint  compute 
the  correlation  p^  ,  of  the  template  to  the  stimulus  ui^y. 


Pt.i 


(A2) 


To  compute  the  likelihood  estimates  for  each  template  t^^, 
we  must  be  able  to  reverse  the  effect  of  Thus  we  de^ 
*!.»  ■■  Ufiu  snd  choose  toes  to  minimize  the  expression: 


(A3) 

A  M 

Solving  for 


J _ 


Finally  we  set: 

•»  imt 

where  X  -  8100,  the  number  of  pixels  in  the  image. 


(A4) 


(A5) 


Likelihood  Estimation 

With  estimates  of  and  a^^  for  the  sub-ideal  dis¬ 
criminator,  and  the  a  priori  values  for  the  super-ideal 
discriminator,  we  can  formulate  a  maximum  likelihood 
estimator.  By  rearranging  terms  of  equation  (Al)  and 
dividing  both  sides  by  /)  yields: 

,A6) 

Substituting  for  t/fi,  and  by  transposing  into  the  fre¬ 
quency  domain,  denoted  by  upper-case  letters  and  indexed 
by  o),  we  have: 

*  N,(o>)  -t-  a^tQ,_do>).  (AT) 

Note  that  the  left  side  of  equation  (AT)  is  simply  a 
difference  image  between  the  stimulus  lV,»(o>)  and  the 
template  T^d/o).  This  difference  is  exactly  equal  to  the  sum 
of  the  luminance  and  quantization  noise  only  when  the 
correct  template  is  chosen  (i » k).  When  the  incorrect 
template  is  chosen  (i  #k)  the  right  hand  side  of  equation 
(AT)  is  equal  to  the  sum  of  the  noise  sources  plus  some 
residue  that  b  equal  to  Tt_^(a>)  -  Tt_^(a).  Under  the 
assumption  that  quantization  noise  can  be  modeled  as 
independent  additive  noise  in  the  frequency  domain,  the 
density  A  of  the  joint  realization  of  the  right-hand  side  of 
equation  (AT)  b  given  by: 

X 


X  exp 


[• 


(A8) 


where  Ft(to)  b  simply  the  kernel  of  filler  6.  in  the  frequency 
domain.  Dropping  the  multiplicative  term  in  equation  (A8), 
which  does  not  depend  on  the  template  T,  and  taking  logs, 
the  ideal  discriminator  chooses  the  template  that  minimizes: 


^ArK>CV>(a»)-r,.(a,)|» 

Finally,  it  b  more  convenient  to  compute  the  power  of 
the  quantization  noise  in  the  space  domain  (e*)  than  in  the 
frequency  domain  (og);  ej  >  e^.  Spatial  quantization  noise, 
9i.»(x),  b  uniformly  distributed  on  the  interval  [-0.5, 0.5), 
so  that  e‘  b  computed  as: 


and  b  equal  to  1/12. 


x*dx 


(AlO) 
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Abstract — We  use  a  difficult  shape  identificatkn  task  to  analyae  how  humans  estract  3D  surface  structure 
from  dynamic  2D  stimuli — the  kinetic  depth  effect  (KDE).  Stimuli  composed  of  luminous  tokens  moving 
on  a  less  luminous  background  yield  accurate  3D  shape  identiScalioo  regardless  of  the  particular  token 
used  (either  dou.  lines,  or  di^).  These  displays  tliiBulatc  both  the  ltt>c(der  (Fourier'energy)  motion 
detectors  and  2nd-order  (nonFonrier)  motion  detectors.  To  detcrtnine  which  system  supports  KOE.  we 
employ  stimulus  manipulations  that  taeaken  or  distort  Ist-ordcr  motion  energy  (e.g.  rnme-to-frame 
alternation  of  the  contrast  polarity  of  tokens)  and  manipulations  that  crerv-  vurroho/awced  stimuli  arhich 
have  no  useful  Ist-order  motion  energy.  All  manipulations  that  in ,  air  Ist-ordcr  motion  energy 
correspondingly  impair  3D  shape  idcntiAcation.  In  ocnatn  eases.  2nd-oido  motion  could  support  limited 
KDE.  but  it  was  not  robust  and  was  oflow  spatial  teaolutioo.  We  conclude  that  Ist-ordcr  motion  detectors 
are  the  primary  input  to  the  kinetic  depth  system.  To  determine  miniBsl  conditions  for  KOE,  we  use  a 
two  frame  display.  Under  optimal  conditions.  KDE  supporu  shape  identification  performance  at  63-94% 
of  full-roution  displays  (where  baseline  it  $%).  Increasiag  the  amount  of  3D  roUlioo  portrayed  or 
introducing  a  blank  inter-stimulus  interval  impain  perfonnaocc.  Together,  our  tesuhs  ooitfirm  that  the 
human  KDE  compuution  of  surface  shape  uses  a  ^obal  optic  flow  computed  primarily  by  Ist-order 
motion  detectors  with  minor  2nd-order  inputs.  Aocurate  30  shape  identification  lequires  only  two  views 
and  therefore  does  not  tc<)uire  knowledge  of  acceleratioo 

KDE  Kinetic  depth  effect  Structure  from  motion  Shape  Optic  flo* 
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When  a  collection  of  randomly  positioned  dots 
moves  on  a  CRT  screen  with  motion  paths  that 
are  projections  of  rigid  3D  motion,  a  human 
viewer  perceives  a  striking  impression  of  three- 
dimensionality  and  depth.  Ihis  phenomenon 
of  depth  computed  from  relative  motion  cues 
is  known  as  the'  kinetic  depth  effect  (KDE; 
Wallach  &  O’CenneH.  1953).  “ 

What  are  the  important  cues  that  lead  to  a  3D 
percept  from  such  a  display?  Is  it  motion,  or  are 
there  other  important  cues?  If  it  b  modoa,  then 
what  kind  of  motion  detection  systemfs)  are 
used  to  support  the  structure-from-motion  com¬ 
putation?  Is  a  computation  of  velocity  sufficient, 
or  are  more  elaborate  measurements  necessary, 
such  as  of  acceleration?  These  are  the  questions 
that  we  address  in  this  paper. 

In  a  series  of  recent  papers  (Dosher,  Land;  Si 
Sperling,  1989a,  b;  Sperling.  Landy.  Dosher  A 
Perkins,  1989;  Sperling,  Dosher  A  Landy,  1990), 
we  examined  the  cues  necessary  for  subjects  to 
perceive  an  accurate  represenution  of  a  3D 


surface  portrayed  using  random  dot  displays.  In 
each  trial  of  a  new  shape  kkntiScation  task  we 
devised,  subjects  view  a  random  dot  represen¬ 
tation  of  one  of  a  set  of  53  3D  shapes  and 
identify  the  shape  and  rotation  direction.  Shape 
identity  feedback  optimizes  the  subject’s  ability 
to  compote  shape  from  eadi  type  of  motion 
stimulus.  For  aocurate  performance,  the  task 
requres  either  a  3D  percept  or  a  subject  strategy 
that  uses  2D  vdodty  tnfonnatioo  in  a  manner 
that  b  computadon^y  equivalent  to  that  re¬ 
quired  to  solve  for  3D  sh^  (Sperling  et  al., 
1989. 1990;  see  the  discussion  of  expt  %  ^elow). 

We  have  shown  that  the  «ily  cue  u*  •  ' 

percepdbn  of  three-dimensiooality  in  the£<r  .’dis¬ 
plays  b  motion  (Sperling  et  1989,  1990). 
Further  experiments  determ-iied  that  global 
optic  flow  b  used  rather  th,\;«  the  position 
information  for  individual  doti.  etnee  r  jcuracy 
remains  high  when  dot  lifetimes  are  reduced  to 
as  little  as  two  frames  (Dosher  et  al.,  1989b).  In 
that  paper,  we  concluded  that  the  input  to  the 
KDE  computation  b  an  optic  flow  generated  by 
a  Ist-order  motion  detection  mechanioi,  such 
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as  the  Reichardt  detector  (Reichardt,  1957). 
Two  manipulations  that  perturb  Ist-order 
motion  energy  mechanisms — flicker  and  po¬ 
larity  alternation — also  interfered  with  KDE 
(Dosher  et  al.,  1989b).  In  polarity  alternation, 
dots  change  over  time  from  black  to  white  to 
black  on  a  gray  background.  When  compared  to 
dots  that  remain  white,  polarity  alternation  was 
equally  or  slightly  more  detectable  in  a  detection 
task,  was  poorer  but  still  well  above  chance  in 
a  discrimination  of  direction  of  motion  task 
(computed,  presumably,  using  tracking  of  the 
dots  or  using  more  elaborate,  2nd-order  motion 
detection  mechanisms)  but  was  useless  for  tasks 
requiring  KDE  or  motion  segregation.  These 
latter  two  tasks  require  the  evaluation  of  vel¬ 
ocity  in  a  number  of  locations  simultaneously 
(Sperling  et  al..  1989).  Shape  identification  per¬ 
formance  in  a  range  of  conditions  was  shown  to 
be  monotonic  with  a  computed  index  of  Ist- 
order  net  directional  power  in  the  stimuli 
(Dosher  et  al..  1989b).  Hence,  for  sparse 
dot  stimuli,  KDE  depends  upon  a  simple 
spatio-temporal  (Ist-order)  Fourier  analysis  of 
multiple  local  areas  of  the  stimulus. 

In  this  paper,  we  further  examine  and  gener¬ 
alize  the  contributions  of  several  types  of 
motion  detectors  to  the  optic  flow  computations 
used  by  the  structure-from-motion  mechanism. 

MOTION  ANALYSIS  MODELS  AND  THE  KDE 

Ist-order  motion  analysis 

To  motivate  the  stimulus  conditions  studied 
here,  we  begin  by  summarizing  models  of  early 
motion  detection  and  analysis.  Several  recent 
motion  detection  models  (van  Santen  &  Sper¬ 
ling.  1984,  1985;  Adelson  i;  Bergen.  1985;  Wat¬ 
son  &  Ahumada,  1985)  share  as  a  common 
antecedent  the  mode!  proposed  by  Reichardt 
(1957).  We  refer  to  this  class  of  models  as 
Ist-order  motion  detectors.  Below,  2nd-order 
mechanisms  involving  additional  processing 
stages  will  be  discussed.  In  the  Reichardt  detec¬ 
tor,  luminance  is  measured  at  two  spatial  lo¬ 
cations  A  and  B.  The  measurement  at  position 
A  is  delayed  in  time,  and  then  cross-correlated 
over  time  with  the  measurement  at  position  B, 
resulting  in  a  “half-detector”  sensitive  to 
motion  from  position  A  to  B.  A  second  such 
“half-detector”  sensitive  to  motion  from  Bto  A 
is  set  in  opponency  with  the  first,  resulting  in  the 
full  motion  detector,  van  Santen  and  Sperling 
(1984,  1985)  have  investigated  this  model  along 
with  extensions  involving  voting  rules  for  com- 


oining  outputs  of  many  detectors  to  enable 
predictions  of  psychophysical  experiments,  re¬ 
sulting  in  their  Elaborated  Reichardt  Detector 
(ERD). 

An  alternative  way  of  characterizing  motion 
detection  is  in  the  frequency  domain.  A  motion 
detector  can  be  built  of  several  linear  spatio- 
temporal  filters.  Each  filter  is  sensitive  only  to 
energy  in  two  of  the  four  quadrants  in  spatio- 
temporal  Fourier  space  (a>,.  cu,).  In  other 
words,  the  filters  are  not  separable.  Their  recep¬ 
tive  fields  are  oriented  in  space-time,  and  thus 
they  are  sensitive  to  motion  in  a  particular 
direction  and  at  a  particular  scale  (Adelson  & 
Bergen.  1985;  Burr,  Ross  A  Morrone.  1986; 
Watson  A  Ahumada.  1985).  The  Fourier 
"energy”  (the  squared  output  of  a  quadrature 
pair  of  filters)  in  each  of  two  opposing  motion 
directions  is  computed,  and  put  in  opponency. 
This  “motion  energy  detector”,  proposed  by 
Adelson  and  Bergen  (198^,  and  the  ERD  differ 
in  their  construction  and  in  the  signals  available 
at  the  subunit  level,  but  are  indistinguishable  at 
their  outputs  (Adelson  A  Bergen,  1985;  van 
Santen  A  Sperling,  1985). 

The  strticture-from-motion  computation  re¬ 
lies  upon  the  measurement  of  image  velocities 
at  several  image  locations.  The  KDE  shape 
identification  task  that  we  use  here  can  be  solved 
by  categorizing  velocity  at  six  spatial  locations 
into  three  categories:  leftward,  approximately 
zero,  and  rightward  (Sperling  et  al.,  1989).  Thus, 
in  order  to  discriminate  the  53  test  shapes 
by  KDE,  motion  detection  must  be  followed 
by  at  least  some  rudimentary  local  velocity 
calculation. 

In  order  to  signal  velocity,  the  outputs  of 
more  than  one  such  Ist-order  motion  detector 
must  be  pooled.  Speed  may  be  computed  by 
pooling  only  two  detectors  (a  motion  and  a 
“sutic”  detector,  Adelson  A  Bergen.  1985).  To 
signal  motion  direction,  signals  must  be  pooled 
across  a  variety  of  orientations  (Watson  A 
Ahumada,  1985).  Rnally,  in  order  to  solve  the 
“aperture  problem”  for  more  complex  stimuli 
(Burt  A  Sperling,  1981;  Marr  A  Ullman,  1981), 
signals  may  be  pooled  over  a  variety  of 
directions  and  perhaps  scales  (Heeger,  1987). 

In  the  previous  paper  (Dosher  ct  al.,  1989b), 
shape  identification  performance  was  shown  to 
relate  directly  to  the  quality  of  the  signal  avail¬ 
able  from  Ist-order  motion  detection  mechan¬ 
isms.  Each  stimulus  consisted  of  a  large  number 
of  dots  on  a  gray^background  representing  a  2D 
projection  of  dots  on  the  surface  of  a  smpoth  3D 
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shape  under  rotary  oscillation.  In  one  condition 
(contrast  polarity  alternation),  the  dots  were 
first  brighter  than  the  background  (“white-on- 
gray”),  then  darker  than  the  background 
("black-on-gray”),  then  bright  again,  in  succes¬ 
sive  frames.  For  a  dense  random  dot  field  (50% 
black/50%  white)  under  simple  planar  motion, 
polarity  alternation  causes  a  percept  of  motion 
opposite  to  the  true  direction  of  motion  (the 
“reverse-phi  phenomenon”,  Anstis  &  Rogers, 
1975);  reverse-phi  is  thought  to  reflect  a  spatio- 
temporal  Fourier  analysis  of  the  stimulus,  since 
contrast  reversal  reverses  the  direction  of 
motion  of  the  lowest-frequency  Fourier  com¬ 
ponents  (van  Santcn  &  Sperling,  1984).  With 
contrast  reversal,  the  outputs  of  Ist-order 
motion  detection  mechanisms  no  longer  simply 
signal  the  intended  direction  and  velocity  of 
motion.  Contrast  reversal  stimuli  do  not  yield 
a  depth-from-motion  percept  (Dosher  ct  al.. 
1989b).  We  take  this  as  evidence  that  the 
KDE  relies  upon  input  from  a  Ist-order  motion 
analysis. 

2nd~order  motion  analysis 

For  the  sparse  random  dot  stimuli  (Dosher  et 
al.,  1989b),  contrast  polarity  alternation  elimi¬ 
nated  the  perception  of  structure  from  motion. 
Nonetheless,  subjects  could  judge  the  direction 
of  patches  of  contrast  polarity  alternating  dots 
undergoing  simple  translation.  What  kind  of  a 
motion  detector  might  be  used  to  correctly 
judge  the  motion  of  a  translating,  polarity- 
alternating  dot?  One  simple  possibility  would  be 
to  first  apply  a  luminance  nonlinearity  to  the 
input  stimulus.  For  example,  if  the  input  stimu¬ 
lus  were  full-wave  rectified  about  the  mean 
luminance,  the  polarity-alternating  stimulus 
would  be  converted  to  the  equivalent  of  rigid 
motion  of  a  white  dot  ofl  a  gray  background.' 
Thus,  a  full-wave  rectifier  of  contrast  followed 
by  a  Ist-order  analyzer  (such  as  those  discussed 
above)  would  be  capable  of  analyzing  such  a 
motion  stimulus  correctly  (Chubb  &  Sperling, 
1988b,  1989a.  b). 

A  motion  detection  system  consisting  of  a 
contrast  nonlinearity  followed  by  a  Ist-order 
detector  is  one  example  of  a  wide  class  of 
“2nd-ordcr  detection  mechanisms”,  each  of 
which  consists  of  a  linear  filtering  of  the  input 
(spatial  and/or  temporal),  followed  by  a  con¬ 
trast  nonlinearity,  followed  by  a  standard  Ist- 
order  motion  detection  mechanism.  A  number 
of  results  demonstrate  the  existence  of  both  1  st¬ 
and  2nd-order  motion  mechanisms  and  show 


the  contribution  of  both  to  the  perception  of 
planar  motion  (Anstis  &  Rogers,  1975;  Chubb 
&  Sperling.  1988b,  1989a.  b;  Ulkens  & 
Koenderink,  1984;  Ramachandran,  Rao  & 
Vidyasagar,  1973;  Sperling,  1976). 

Can  both  1st-  and  2nd-order  motion  mechan¬ 
isms  be  used  by  the  KDE  system?  The  polarity- 
alternating  dots  did  not  yield  an  effective  KDE 
percept  of  our  3D  shapes.  If  one  accepu  the 
existence  of  both  1st-  and  2nd-order  motion 
mechanisms,  why  didn’t  the  2nd-order  system 
support  KDE?  The  KDE  stimuli  were  relatively 
small  (3.7  x  4.2  deg)  and  viewed  foveally  (eye 
movements  were  permitted  throughout  the  2  sec 
stimulus  duration).  Evidence  from  studies  of 
planar  motion  suggests  that  both  systems  were 
available  under  these  conditions  (Chubb  & 
Sperling,  I98gb).  For  polarity  alternation 
stimuli,  the  most  salient  low  frequency  com¬ 
ponents  from  the  Ist-order  system  were  in 
the  wrong  direction.  We  assume  that  the  2nd- 
ordcr  system  yields  a  correct  (if  attenuated) 
analysis.  Bad  shape  identification  performance 
may  have  resulted  either  from  the  perturbed 
Ist-order  analysis  or  because  of  competition 
between  the  1st-  and  2nd-order  systems  (which 
signaled  opposite  directions  of  motion  in 
some  frequency  bands).  Our  evidence  (Dosher 
et  al.,  1989b)  demonstrated  that  Ist-order 
system  input  is  the  predominant  input  to 
KDE,  but  it  did  not  exclude  the  possibility  of 
input  from  2nd-order  motion  detection  mech¬ 
anisms.  To  approach  that  question,  we  con¬ 
sider  a  KDE  stimulus  that  produces  a  simple 
2nd-order  motion  analysis,  but  to  which 
the  Ist-order  motion  system  is,  statistically, 
blind. 

Microbalanced  motion  stimuli 

Chubb  and  Sperling  (1988b)  defined  a  class  of 
stimuli,  called  microbalanced,  among  which  are 
stimuli  with  the  properties  that  we  desire.  In 
'expt  1  we  concentrate  on  two  examples  of 
microbalanced  motion  stimuli.  These  stimuli  are 
randoin  in  the  sense  that  any  given  stimulus  is 
a  realization  of  a  random  process.  As  proven  by 
Chubb  and  Sperling  (1988b),  if  a  stimulus  is 
microbalanced  then  the  expected  output  of 
every  Ist-order  detector  (ERD  or  motion 
energy  detector)  will  be  zero.  Thus,  Chubb  and 
Sperling  defined  a  class  of  stimuli  for  which  a 
consistent  motion  signal  requires  a  2nd-order 
motion  analysis,  and  showed  that  the  2nd- 
ordcr  analysis  predicted  observers’  percepts  for 
several  examples  {)f  the  class. 
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The  polarity  alternation  stimulus  is  not 
raicrobalanccd;  any  given  frequency  band  does 
show  consistent  motion,  with  the  lowest  spatial 
frequencies  signalling  motion  in  the  wrong  di¬ 
rection.  This  stimulus  can  be  transformed  into 
a  microbalanced  one  as  follows:  for  each  dot, 
choose  the  contrast  polarity  randomly  and  inde¬ 
pendently  for  every  frame.  Any  given  Ist-ordcr 
detector  will  be  just  as  likely  to  signal  rightward 
motion  as  it  is  to  signal  leftward  motion  since  it 
will  either  see  the  same  contrast  polarity  across 
any  successive  pair  of  frames  or  it  will  see 
contrast  polarity  alternate,  with  equal  prob¬ 
ability.  One  question  we  examine  in  this  paper 
is  whether  the  motion  signal  available  from 
2nd-order  mechanisms  can  be  used  to  compute 
3D  structure. 

We  p.csent  two  experiments.  In  the  first,  we 
examine  performance  on  a  shapeldentification 
task  for  a  variety  of  KDE  stimuli.  Several  typ^es 
of  stimuli  provide  good  Ist-order  motion. 
Others  are  microbalanced  and  hence  can  only  be 
analyzed  by  2nd-order  mechanisms.  Still  others 
offer  good  Ist-order  motion,  but  involve 
camouflage  similar  to  that  available  in  some  of 
the  microbalanced  conditions.  We  find  that 
Ist-order  motion  is  used,  and  that  input  from 
2nd-order  mechanisms  may  also  be  used  but  is 
not  as  robust.  In  a  second  experiment,  we 
examine  the  residual  shape  percept  from  two- 
frame  KDE  stimuli  in  order  to  determine 
whether  a  single  velocity  field  is  a  sufficient  cue 
for  shape  identification  or  whether  acceleration 
also  is  needed. 

EXPERI.MENT  1,  POLARITY  ALTERNATION. 

MICROBALANCE,  AND  CA.MOUFLACE 

In  the  first  experiment,  a  shape  discrimination 
task  is  used  with  a  variety  of  displays.  First,  in 
order  to  sensibly  compare  results  to  our  pre¬ 
vious  work  (Sperling  et  al.,  1989;  Dosher  et  al.,' 
1989b),  there  are  control  conditions  that  are 
identical  to  those  of  our  previous  experiments 
(the  “Motion  without  density  cue,  standard 
speed,  standard  intensity"  and  “Motion  with 
polarity  alternation,  standard  speed,  standard 
intensity"  conditions  of  the  preceding  paper).  In 
addition  to  dots,  randomly  positioned  disks  and 
lines  are  also  used  here  in  order  to  examine  t:ie 
effects  of  the  foreground  token  used  to  carry  the 
motion.  The  disk  and  line  tokens  are  larger  than 
the  single  pixel  dots,  and  hence  have  mere 
contrast  energy.  They  enable  us  tolest  whether 
our  previous  failure  to  find  KDE  with  polarity 


alternation  resulted  from  the  low  contrast 
energy  in  the  stimulus.  Two  forms  of  nucro- 
balanced  stimuli  are  used,  allowing  us  to  test 
KDE  shape  identification  performance  with 
stimuli  to  which  Ist-order  motion  detecton  are 
blind.  Finally,  we  examine  stimuli  in  which 
moving  textured  tokens  are  camouflaged  by  a 
similarly  textured  background. 

Method 

Subjects.  There  were  three  subjects  in  this 
experiment.  One  was  an  author,  and  the  other 
two  were  graduate  students  naive  to  the  pur¬ 
poses  of  this  experiment.  All  had  normal  or 
corrected-to-normal  vision.  There  were  slight 
differences  in  the  conditions  for  each  of  the 
three  subjects.  These  will  be  pointed  out  below. 

White -on ‘gray  dot  stimuli.  First,  we  briefly 
describe  the  stimuli  that  consist  of  bright  dots 
moving  on  a  gray  background  representing  a 
variety  of  3D  shapes.  This  description  will  be 
somewhat  abbreviated,  since  the  same  stimuli 
have  been  used  in  previous  studies  and  more 
complete  descriptions  are  available  (Sperling  et 
al.,  1989).  The  other  stimuli  used  in  the  present 
study  result  from  simple  image  processing  trans¬ 
formations  applied  to  the  white-on-gray  dot 
stimuli. 

Stimuli  were  based  upon  a  fixed  vocabulary  of 
simple  shapes  consisting  of  bumps  and  concav¬ 
ities  on  a  flat  ground.  The  3D  shapes  varied  in 
the  number,  position,  and  2D  extent  of  these 
bumps  and  concavities.  The  process  of  generat¬ 
ing  the  sUmuIi  is  illustrated  in  Fig.  1. 

The  first  step  in  creating  a  stimulus  involves 
the  specification  of  a  3D  surface.  For  a  square 
area  with  sides  of  length  r,  a  drcle  with  diameter 
0.9  j  is  centered,  and  three  fixed  points,  labeled 
I.  2  and  3.  are  specified.  For  a  given  shape,  one 
of  two  such  sets  of  poinu  is  used  (the  upward- 
pointing  triangle  or  the  downward-pointing  tri¬ 
angle,  labeled  u  and  <4  respectively).  The  shape 
is  specified  as  having  a  depth  of  zero  outside  of 
the  circle.  Fpr  each  of  the  three  identified  points, 
the  depth  nuly  be  either  -h0.5r.  0.0.  or  — 0.5  r. 
which  are  labeled  as  -i- ,  0.  and  -,  respectively. 
The  depth  values  for  the  rest  of  the  figure  were 
interpolated  by  using  a  standard  cubic  spline  to 
connect  the  three  interior  points  with  the  zero 
depth  surround.  Thus,  there  are  54  ways  to 
designate  a  shape:  u  vs  </,  and  for  each  of  three 
interior  points,  -I-  vs  0  vs  — .  We  designate  a 
shape  by  denotii^  the  triangle  used,  followed  by 
the  depth  designations  of  the  three  points  in  the 
order  shown  in  Fig.  I  A.  For  example,  u  — -hO 
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Fig.  1.  Stimulus  shapes,  rotitioiis,  and  their  designaiions.  (A)  Shapes  were  constructed  by  choosing  one 
of  the  two  equilateral  triangles  represented  here.  Each  point  in  the  triangles  was  given  a  positive  depth 
(i.e.  toward  the  observer).  lero  depth,  or  negative  depth,  represented  as  -l.  9  ana  ,  respectivdy.  A 
smooth  shape  splined  these  three  points  to  aero  depth  values  outside  of  the  A  shape  is  dnignaifd 
by  the  choice  of  triangle  (a  or  d),  followed  by  the  depth  designations  of  the  three  points  in  the  order  given 
in  the  figure.  (B)  Some  represenutive  shapes  generated  by  this  procedure.  All  diapes  consisted  of  a  bump, 
concavity,  or  both,  with  a  variation  in  positioo  and  extent  of  these  areas.  (C)  Shapes  saere  represent^ 
by  a  set  of  dou  randomly  painted  nn  the  surface  of  the  shape,  and  wiggled  about  a  vertical  axb  throu^ 
the  center  of  the  display.  The  motion  was  a  siausoidal  rotation  that  moved  the  object  to  at  to  face  off 
to  the  observer's  right,  then  his  or  her  left,  then  bade  to  face^orward  (denoted  I).  or  the  reverse 

(denoted  r). 


is  a  shape  with  a  bump  in  the  upper-middle  of 
the  display,  and  a  concavity  in  the  lower-kfi 
(Fig.  IB).  There  are  S3  distinct  shapes,  because 
wOOO  and  dOOO  both  denote  a  flat  square. 

Displays  were  generated  by  sprinkling  dots 
rar  !omly  on  the  3D  surface  generated  by  the 
spline,  rotating  that  surface,  and  projecting  the 
resulting  dot  positions  onto  the  image-plane 
using  parallel  perspective.  A  large  number  of 
dots  are  chosen  uniformly  over  a  2D  area 
somewhat  larger  than  the  s  by  i  square,  and 
each  dot’s  depth  is  determine  by  the  cubic 
spline  interpolant  (where  the  zero  depth  of  the 


sunound  is  continued  outside  the  square).  This 
collection  of  dots  is  rotated  about  a  vertical  axis 
that  is  at  zero  depth  and  centered  in  the  display. 
The  rotation  ang^  9{k)  is  a  sinusoidal  **wi^le”: 
9(k)>>  ±25sin(2xk/M)  deg,  where  k  is  the 
frame  number  within  the  30  frame  display. 
Thus,  the  display  either  rotated  25  deg  to  the 
right,  then  reverxd  its  direction  until  it  faced 
25  deg  to  the  left,  then  reversed  its  direction 
until  it  was  again  facing  forward  (labeled  /),  or 
rotated  in  the  opposite  manner  (labeled  r,  see 
Fi^.  IQ.  The  displays  presented  these  3D 
collections  of  dots  in  parallel  perspective 
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Icrsiinous  dots  (single  pixels)  on  a  darker 
'j;;ckgi-ound. 

A  stimulus  name  consists  of  the  name  oi  the 
■hape  followed  by  the  type  of  rotation  (e.g. 
u  r  -0/),  resulting  in  108  possiWe  names.  Using 
parallel  perspective,  there  is  a  fundamental 
artibiguity  with  the  KDE:  reversing  the  depth 
values  and  rotation  direction  of  a  particular 
shape  and  rotation  produces  exactly  the  same 
display.  In  other  words,  a  convexity  rotating  to 
the  right  produces  exactly  the  same  set  of  2D 
dot  motions  as  a  concavity  rotating  to  the  left. 
Thus,  u-f-0/  and  u  —  +0r  describe  precisely 
the  same  display  type.  There  is  also  no  differ¬ 
ence  in  display  type  among  uOOO/,  uOOOr,  dOOOl 
and  d/OOOr  This  results  in  a  total  of  53  distinct 
display  types. 

These  experiments  used  54  white-on-gray  dot 
displays,  including  two  instantiations  of  the  flat 
stimulus  t/000  (with  different  dot  placements) 
and  one  instantiation  of  each  other  display  type. 
Each  set  of  dots  was  windowed  to  a  display  area 
of  182  X  182  pixels  (corresponding  to  the  s  x  s 
square),  with  dots  presented  as  single  lumiaous 
pixels. 

When  the  dots  on  the  surface  of  a  shape  snove 
back  and  forth  in  the  display,  the  local  dot 
density  changes  as  the  steepness  of  the  hills  and 
valleys  changes  (with  respect  to  the  line  of 
sight).  In  previous  work  (Sperling  et  al.,  1989), 
we  showed  that  this  density  cue  is  neither 
necessary  nor  sufficient  for  the  perception  of 
depth.  However,  it  is  a  weak  cue  which  one  of 
three  highly  trained  subjects  was  able  to  use  for 
modest  above-chance  performance  when  it  was 
presented  in  isolation.  In  other  words,  changing 
dot  density  is  an  artifactual  cue  to  the  task.  As 
in  previous  experiments,  we  remove  this  cue  by 
deleting  or  adding  dots  as  needed  throughout 
the  display  in  order  to  keep  local  dot  d.’nsity 
constant.  As  a  result  of  this  manipulatio.n,  all 
displays  had  approx.  300  dots  visible  in  the 
display  window.  The  removal  of  the  density  cue 


results  in  a  smai  amount  of  dot  scintillation 
that  neither  lowers  performance  substantially 
nor  appears  to  be  useful  as  an  artifactual  cue 
(Sperling  et  al.,  1989,  1990). 

Other  tokens.  The  54  stimuli  described  so  far 
consisted  of  luminous  dots  moving  to  and  fro  on 
a  less  luminous  background.  All  other  stimuli 
were  based  upon  these  displays.  First,  three 
conditions  involved  changes  of  the  token  iini 
carried  the  motion.  The  moving  dots  were  re¬ 
placed  with  disks,  patterned  disks,  or  wires.  vVe 
refer  to  the  dot,  wire,  and  disk  conditions  as 
white -on-gray  stimuli,  and  the  patterned  disks 
as  pattern-on-gray. 

To  create  a  disk  stimulus,  a  dot  stimulus  is 
modified  in  the  following  way.  Each  luminous 
dot  in  the  stimulus  is  replaced  with  a  6  x  6  pixel 
luminous  diamond  centered  on  the  dot 
(Fig.  2b),  which  appears  disk-like  from  the 
viewing  distance  us^  in  the  expeiiment.  A 
sample  image  of  white-on-gray  disks  is  depicted 
in  Fig.  2c,  and  is  based  on  the  white-on-gray  dot 
stimulus  frame  shown  in  Fig.  2a. 

The  pattem-on-gray  disk  stimuli  are  gener¬ 
ated  in  a  similar  fashion.  The  6x5  diamond 
consists  of  24  pixels  which  arc  a  mixture  of 
black  and  white  (12  of  each).  These  are  dis¬ 
played  on  an  intermediate  gray  background. 
The  diamond  pattern  and  a  sample  stimulus 
frame  are  shown  in  Fig.  2d  and  e,  respectively. 
Noir  that  the  diamond  pattern  has  an  equal 
number  of  black  and  white  pixels  in  each  row. 

Other  stimuli  were  based  on  “wires”.  Each 
dot  was  connected  by  a  straight  line  (subject  to 
the  pixel  sampling  density)  to  all  neighbors  that 
were  at  a  2D  distance  no  greater  than  15.5  pixels 
(Fig.  2f ).  Note  that  a  vector  is  drawn  between 
two  points  based  on  their  distance  in  the  image, 
not  on  their  simulated  3D  distance.  Since  the 
lines  were  straight,  when  set  in  motion  they 
objectively  define  a  thickened  surface  with  lines 
cutting  through  the  interior  of  each  bump  and 
concavity.  This  may  have  yielded  a  perceived 


Fig.  2  (opposite).  Stimulus  display  general  on  for  expt  1.  (a)  A  single  frame  of  a  white-on-gray  dou 
stimulus.  All  displays  shown  in  this  figure  me  based  on  this  stimulus  frame,  (b)  The  diamond  shape  used 
to  generate  the  disks  from  the  dots,  (c)  A  whste-on-gray  disks  stimulus  frame,  (d)  Tte  patterned  diamond 
for  the  pattem-on-gray  condition,  (e)  A  pattem-on-gray  frame,  (f)  A  white-on-gray  wires  fraitw.  All  pairs 
of  dots  in  Fig.  2A  were  connected  whose  ir.ter-point  disunce  was  leas  t^  13.5  pixeb.  (g)  A  frame  of 
dynamic-on-gray  dots.  In'this  condition  each  dot  was  painted  black  or  white  randomly  andindcpendently 
with  probability  of  0.5  for  each  color,  (h)  A  frame  of  dynamic-on-gray  duks.  The  sme  procedure  as  in 
(g)  was  applied  to  each  pixel  lying  in  each  disk,  (i)  A  frame  of  dynamic-on-gray  wires,  (j)  A.  frame  of 
dynamic-on-static  disks.  For  both  dynamic-on-static  conditions  (disks  and  wires),  the  tokeu  and  the 
background  consisted  of  random  dot  noise,  and  so  the  tokens  cannot  be  discerned  from  a  single  static 
frame,  (k)  A  frame  of  the  pattera-on-static  condition.  This  frame  contains  300  copies  of  the  pattern  in 
(d)  on  a  static  noise  background.  The  camoufiage  b  quite  effective.  (I)  An  enlargement  of  the  central 
portion  of  (k),  with  the  patterned  dbks  emphasized. 
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(tessetated)  surface  having  slightly  less  relative 
depth  than  the  base  surface.  The  choice  of  I5.S 
pixels  as  the  criterion  for  drawing  a  line  was  a 
compromise  set  in  order  to  make  sure  that  all 
stimulus  dots  became  an  endpoint  to  at  least 
one  line,  and  that  no  line  was  so  long  as  to 
excessively  cut  through  the  simulated  surface. 

The  white-on-gray  disks  and  pattem-on-gray 
disks  were  based  on  the  dot  stimuli.  The  same 
exact  instantiations  were  used  in  all  three  con¬ 
ditions.  The  nth  frame  of  a  given  shape  and 
rotation  consisted  of  either  dots,  disks  or  pat¬ 
terned  disks  centered  on  the  same  set  of  image 
positions.  For  the  wire  stimuli,  a  new  set  of  54 
instantiations  was  made. 

Dynamic -on -gray.  Three  types  of  stimuli 
were  used  to  explore  the  motion  of  patches  of 
dynamic  noise  moving  on  a  gray  background. 
These  stimuli  are  microbalanced,  as  we  dis¬ 
cussed  in  the  previous  section.  These  stimuli  are 
derived  from  the  dot,  disk,  and  wire  stimuli.  To 
produce  a  dynamic-on-gray  stimulus  from  a 
white-on-gray  stimulus,  simply  change  the  lumi¬ 
nance  of  each  white  pixel  in  each  stimulus  frame 
(i.e.  the  foreground  or  token  pixels)  to  black 
randomly  and  independently  with  probability 
0.5.  Thus,  foreground  pixels  undergo  random 
contrast  polarity  alternation  while  background 
pixels  are  gray  (i.e.  have  zero  contrast).  Sample 
frames  are  illustrated  in  Fig.  2g,  h  and  i. 

Dynamic -on -static.  Two  types  of  stimuli  were 
used  to  explore  the  motion  of  patches  of 
dynamic  noise  moving  on  a  static  noise  back¬ 
ground.  This  class  of  stimuli  is  also  micro- 
balanced  (Chubb  &  Sperling,  1988b).  We  derive 
dynamic-on-static  stimuli  from  the  disk  and 
wire  stimuli.  The  foreground  pixels  consist  of 
dynamic  noise,  just  as  in  the  previous  dynamic- 
on-gray  case.  The  background  pixels  consist  of 
a  static  frame  of  patterned  texture,  where  each 
pixel  is  randomly  chosen  to  be  either  black  or 
white  with  a  probability  of  0.5,  just  as  the 
dynamic  noise  is.  If  a  given  pixel  is  a  back¬ 
ground  position  for  two  successive  frames, 
then  its  color  does  not  change.  If  that  position 
is  a  foreground  pixel  in  either  or  both  frames, 
then  there  is  a  50%  chance  that  its  color  will 
change.  A  single  frame  of  dynamic-on-static 
stimulus  is  simply  a  frame  of  random  dot  noise 
(Fig.  2j).  The  motion-carrying  tokens  are  not 
discernible  from  a  single  frame.  Rather,  the 
areas  of  moving  dynamic  noise  define  the 
foreground  tokens. 

Contrast  polarity  alteration.  Three  stimulus 
conditions  involved  contrast  polarity  alterna¬ 


tion.  This  stimulus  manipulation  was  explored 
thoroughly  for  dot  stimuli  in  the  preceding 
paper  (Dosher  et  al.,  1989b).  In  this  condition, 
the  motion-carrying  tokens  alternate  from  white 
to  black  to  white  again  on  successive  frames,  all 
against  a  background  of  intermediate  gray. 
Constrast  polarity  alternation  was  used  with 
dots,  disks,  and  wires,  resulting  in  three  polarity 
alternation  conditions. 

Pattern-on-static.  The  final  condition  in¬ 
volves  pattern  camouflage.  This  condition  is 
derived  from  the  pattem-on-gray  stimuli.  The 
gray  background  is  replaced  with  a  frame  of 
static  random  dot  noise.  In  other  words,  the 
patterned  disk  tokens  move  to  and  fro  in  front 
of  a  screen  of  static  random  dots,  occluding  it 
(and  occasionally  each  other)  as  they  pass  by.  A 
frame  of  this  stimulus  condition  is  pictured  in 
Fig.  2k,  and  enlarged  in  Fig.  21,  where  we  have 
artificially  highlighted  the  patterned  disks  for 
comparison  to  the  pattern  kernel  shown  in  Fig. 
2d.  There  are  approx.  300  patterned  disks  in 
Fig.  2k.  As  you  can  see,  the  camouflage  is  quite 
effective.  When  the  patterned  disks  move,  as  one 
might  expect,  they  are  easily  visible  (Julesz, 
1971). 

Display  details.  There  are  a  total  of  13  con¬ 
ditions  (3  white-on-gray,  1  pattern-on-gray,  3 
contrast  polarity  alternation,  3  dynamic-on- 
gray,  2  dynamic-on-static,  and  1  pattern-on- 
static).  There  were  54  distinct  displays  for  each 
of  the  13  conditions.  In  all  conditions,  the 
displays  are  windowed  to  an  area  of  182  x  182 
pixels.  Displays  were  computed  using  the  HIPS 
image  processing  software  (Landy,  Cohen  &. 
Sperling,  1984a,  b),  and  displayed  by  an  Adage 
RDS-3000  image  display  system. 

Subjects  MSL  and  JBL  viewed  these  stimuli 
on  a  Conrac  721 1C  19  RGB  color  monitor.  Only 
the  green  gun  was  used,  and  so  stimuli  appeared 
as  bright  green  and  black  pixels  (as  dots,  disks, 
lines  or  noise)  on  a  green  background  of  inter¬ 
mediate  luminance.  The  stimuli  subtended 
3.7  X  4.2  deg.  Stimuli  were  viewed  monocularly 
through  a  dark  viewing  tunnel,  using  a  circular 
aperture  which  was  slightly  larger  than  the 
stimuli. 

Subject  LIJ  viewed  the  stimuli  on  a  US 
Pixel  PX15  black  and  white  monitor  with 
a  P4-like  phosphor.  Here,  stimuli  subtended 
2.9  X  2.9  deg,  and  appeared  as  white  and  black 
pixels  on  an  intermediate  gray  background. 
Stimuli  were  viewed  monocularly  through  a 
circular  aperture  in  cardboard  which  approxi¬ 
mately  matched  the  hue  of  the  displays,  and 
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v.7h:.h  had  approximatciy  the  same  luminance  as 
the  stimulus  background. 

Each  stimulus  consisted  of  30  stimulus 
I  fames.  These  were  presented  at  a  60  Hz  frame 
rate.  Each  frame  was  repeated  four  times,  result¬ 
ing  in  an  effective  rate  of  15  new  stimulus  frames 
per  second.  Each  stimulus  lasted  2  sec.  A  trial 
sequence  consisted  of  a  fixation  spot,  a  blank 
interval,  the  30  frame  stimulus,  and  a  blank.  The 
fixation  and  blank  lasted  either  for  I  sec  each 
(subjects  MSL  and  JBL),  or  O.S  sec  each  (subject 
LJJj.  The  background  luminance  remained  con¬ 
stant  throughout  the  trial  sequence.  Subjects 
were  free  to  use  eye  movements  to  actively 
explore  the  display.  Stimuli  were  viewed  from  a 
distance  of  1.6  m.  After  each  stimulus  display, 
subjects  responded  with  the  name  of  the  shape 
and  rotation  direction  using  either  a  computer 
keyboard  or  response  buttons. 

Slightly  different  image  luminances  were  use. 
for  each  subject.  The  background  luminance  fc 
subjects  MSL,  JBL  and  LJJ  were  31.0. 40  0  ane 
45  Oed/m-  respective;  Since  isolated  luminous 
pixels  were  used,  the  appropriate  unit  of 
measurement  is  extra  pcd/pixel  for  bright 
pixels,  and  removed pixel  for  dark  pixels,  a!, 
at  a  specified  viewing  distance  (Sperling.  1971'; 
Stimuli  were  calibrated  so  that  extra  ped, pixel 
and  removed  ped  pi.xel  were  equal.  For  subjects 
MSL,  JBL  and  LJJ,  these  were  13.2.  19.2 
and  15.7  ped 'pi.xel,  respectively,  at  a  viewing 
distance  of  1.6  m.  Contrasts  were  nominally 
100%. 

Procedure.  There  were  13  stimulus  conditions. 
For  each  condition,  there  were  54  stimuli  (two 
instantiations  of  the  flat  stimulus  uOOO.  and  one 
instantiation  of  each  of  the  52  other  possible 
distinct  shape/rotation  combinations).  This  re¬ 
sulted  in  702  stimuli,  each  of  which  was  vie .  j 
once  by  each  subject  These  702  trials  -  : 
viewed  in  random  order  in  six  blocks  of  i . ' 
trials.  On  a  given  trial,  a  stimulus  was  shown, 
subjects  keyed  in  their  responses,  and. then 
feedback  was  provided  so  that  we  measured 
the  best  performance  of  which  the  subject 
was  capable.  Each  block  lasted  approx.  1  i..'. 
Subjects  ran  several  practice  sessions  on  t!:: 
whiie-on-gray  dots  condition  before  da*a 
were  collect^.  Given  the  mix  of  stimuli  a 
a  given  condition,  guessing  base  rates  for 
the  identification  of  shape  and  rotation  direc¬ 
tion  were  between  1/53  (for  a  strategy  of 
random  guessing)  and  2/54  (for  a  strategy 
of  always  answering  uOOOl,  or  one  of  its 
equivalents). 
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Ftg.  3.  ResutU  of  expt  I.  ResuIU  are  given  for  three  subjects. 
Different  symbob  in  the  ban  represent  diflerent  tokens 
fiarge  open  dou  for  ibe  disk  and  patterned  disk  tokens, 
s.TuU  solid  dots  for  the  dot  tokens,  and  asterisks  for  the  wire 
tokens). 


Results 

The  results  for  tht  three  subjects  are  summar* 
ized  in  Fig.  3.  Eadi  perfonnanoe  measure  given 
here,  is  the  percent  correct  over  54  trials.  We 
discuss  each  class  of  stimulus  condition  in  turn. 

White -on-gray IPattem-on-gray,  As  ex¬ 
pected,  the  performance  on  the  three  whiteon- 
gray  and  the  one  pattem-on-gray  condition  was 
uniformly  high.  The  tokens  provided  excellent 
motion  signals  because  they  were  moving  rigid 
areas  of  high  contrast.  It  did  not  particularly 
matter  whether  we  used  dots,  as  in  our  previous 
studies,  wires,  as  in  the  early  wire-frame  KDE 
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work  (V/allach  &  0"Cotuse!!,  1953),  disks,  or 
patterned  disks.  The  disk  and  patterned  disk 
stimuli  provided  very  strong  percepts  of  shape, 
although  the  disks  did  not  undergo  realistic 
foreshortening  as  they  rotated.  In  fact,  the  dot 
stimuli  gave  the  weakest  percept  of  depth.  These 
iclcens  had  the  least  contrast  energy  (i.e.  were 
the  smallest),  and  hence  were  harder  to  detect. 
Subject  JBL  had  the  greatest  difficulty  in  seeing 
these  small  dots,  and  hi'  results  show  a  slight 
drop  in  performance  for  the  dot  stimuli. 

Dynamic -on -gray.  The  motion  of  a  token 
filled  with  dynamic  random  dot  noise  moving 
on  a  gray  background  is  microbalanccd.  In 
other  words,  Ist-order  motion  detectors  arc 
“blind”  to  this  stimulus.  The  expected  value  of 
the  output  of  such  a  detector  is  zero  (a^oss 
random  realizations  of  the  stimulus).  Simple 
2nd-ordcr  mechanisms  (c.g.  using  rectification) 
serve  to  reveal  the  true  motion. 

The  results  for  three  subjects  arc  somewhat 
different.  For  two  subjects  (LJJ  and  JBL), 
performance  is  always  at  or  near  chance  (less 
than  10%  correct  in  all  cases),  although  for 
subject  LJJ  with  the  dynamic-on-gray  dots  the 
performance  is  significantly  above  chance 
(P  <  0.05).  On  the  other  hand,  for  subject  MSL, 
performance  is  always  well  above  chance 


•In  order  to  test  the  range  of  luminances  over  which  polarity 
alteration  was  eflective,  we  ran  a  control  experiment 
(using  MSL  and  JBL  as  subjects),  where  a  variety  of 
while  pixel  luminances  were  used  with  a  given  black  pixel 
luminance.  We  viewed  a  variety  of  dynamic-on-gray 
display',  varying  the  luminance  values  for  the  black  and 
white  pixels  independently  over  a  wide  range.  We  also 
tested  a  variety  of  other  luminance  calibration  pro¬ 
cedures.  Dynamic-on-gray  stimuli  are  only  micro-bal¬ 
anced  if  the  contrast  energy  of  the  white  pixels  is  the 
same  as  that  of  the  black  pixels.  And,  if  is  difficult  to 
calibrate  the  luminance  of  individual  pixels  embedded  i.i 
a  complex  display  texture  given  that  the  desired  pattern 
is  first  low-pass  Altered  by  the  CRT  video  amplifier,  and 
then  passes  through  the  gun  nonlinearity  (see  Mulligan 
&  Stone,  1989,  for  a  full  discussion  of  this  point).  Thus, 
it  was  important  to  verify  that  our  results  were  robi'Si 
over  a  range  of  luminance  values  overlapping  the  cali¬ 
brated  equal  contrast  point. 

To  summarize,  shape  identification  performance  is 
consistent  with  the  results  of  expt  1  for  a  reasonably  wide 
range  of  white  pixel  luminances.  Subject  MSL  consist¬ 
ently  performs  at  moderate  levels,  and  subject  JBL 
consistently  performs  at  or  near  chance.  The  luminance 
levels  yielding  poor  shape  identifleation  performance  are 
consistent  with  the  levels  that  result  in  the  weakest  3D 
percept,  and  are  roughly  consistent  with  the  liinunance 
levels  that  are  balanced  (black  pixel  decrement  vs  white 
pixel  increment)  for  a  variety  of  calibration  displays.  The 
performance  levels  for  dynamic-on-gray  stimuli  in  expt 
1  do  not  result  from  a  miscalibration  of  luminance  levels. 


(24-39‘5'L  ccrreci  identifications),  but  far  less 
than  his  nearly  perfect  (94-98%  correct)  per¬ 
formance  with  white  or  pattern  tokens  on  gray.* 

The  Ist-ordcr  motion  mechanisms  are  clearly 
the  most  eflective  input  to  the  KDE  system, 
since  eliminating  motion  detectable  by  Ist-order 
mechanisms  reduces  performance  substantially 
for  all  subjects.  The  results  for  subject  MSL 
suggest  that  2nd-order  motion  mechanisms  can 
also  be  used.  On  some  trials,  fragments  of  the 
microbalanccd  stimuli  did  appear  3D  to  this 
subject  (one  of  the  authors),  especially  in  the 
fovcally-vicwed  portion  of  the  stimulus.  To  raise 
his  performance  level,  he  used  sophisticated 
guessing  strategies  based  on  active  eye  move¬ 
ments  and  local  measurements  of  motion  or 
three-dimensionality  in  the  fovea  at  a  small 
number  of  locations  of  the  display.  But.  these 
strategies  only  serve  to  bring  performance  up  to 
mediocre  levels  in  comparison  with  performance 
with  rigid  white-on-gray  motion. 

Dynamic-on-static.  The  dynamic-on-static 
manipulation  also  results  in  a  micro-balanced 
stimulus.  For  the  dynamic-on-static  conditions, 
performance  is  at  chance  level  for  all  three 
subjects,  and  for  both  wire  disk  tokens.  As  with 
the  dynamic-on-gray  conditions,  the  motion  of 
the  tokens  is  visible.  It  is  not  particularly 
difficult  to  detect  the  motion  of  an  area  of 
dynamic  noise  on  a  static  noise  background 
(Chubb  &  Sperling.  1988b).  However,  this  sort 
of  motion  engenders  no  shape  percept  whatever 
under  the  conditions  of  our  experiments. 

Unlike  dynamic-on-gray  stimuli,  dynamic- 
on-static  stimuli  are  not  revealed  by  contrast 
rectification.  Detection  of  the  motion  of  a  re¬ 
gion  of  flicker  requires  more  elaborate  2nd- 
order  mechanisms.  Regions  of  flicker  could  first 
be  detected  by  applying  a  linear  temporal  filter 
(such  as  differentiation),  followed  by  rectifi¬ 
cation.  and  then  by  application  of  a  Ist-order 
motion  mechanism.  Some  such  complex  2nd- 
order  motion  detector  exists  in  the  human  visual 
system,  since  we  are  capable  of  seeing  areas  of 
flicker  move,  inciqding  in  the  displays  of  our 
experiment  (at  least  with  scrutiny).  Yet,  this 
2nd-order  motion  detection  system  does  not 
support  the  structure-from-motion  computation 
for  our  dynamic-on-static  stimuli. 

Prazdny  (1986)  reached  the  opposite  con¬ 
clusion  using  dynamic-on-static  displays  repre¬ 
senting  simple  wire  object!  rotating  in  a 
tumbling  motion.  Each  object  contained  five 
wires,  and  subjects  were  required  to  identify  the 
object  among  six  alternative  wire-frame  objects. 
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fiii  dis;p!ays  were  7x7  deg,  and  the  wires  were 
several  pixels  thick.  Performance  was  quite  high 
;ri  the  task  for  five  subjects.  Although  we  have 
some  reservations  about  the  experimental 
method  employed  by  Prazdny,  we  have  gener¬ 
ated  similar  displays  in  our  laboratory,  and  our 
dynamic-on-static  wire-frame  displays  do  yield 
a  shape  percept  when  displays  are  restricted  to 
a  small  number  of  wires. 

The  most  likely  explanation  of  the  difference 
between  our  results  ^and  those  of  Prazdny  in¬ 
volves  the  difference  in  spatial  resolution  re¬ 
quired  by  each  task.  Chubb  and  Sperling 
(1988a)  have  demonstrated  that  2nd-ordcr 
motion  systems  have  less  spatial  resolution  than 
the  Ist-order  mechanisms,  and  that  their  resol¬ 
ution  drops  precipitously  with  increases  in  reti¬ 
nal  eccentricity.  In  our  displays,  motion  v.as 
about  a  vertical  axis  using  parallel  perspective, 
and  hence  all  motion  was  along  the  horizontal. 
There  could  be  as  many  as  10  or  20  disks  or 
wires  in  a  given  row  of  the  image  to  resolve.  Our 
displays  did  not  yield  a  global  percept  of  optic 
flow,  but  motion  was  perceived  fovcally  with 
scrutiny.  This  is  entirely  consistent  with  Chubb 
and  Sperling’s  observation.  Prazdny  did  not 
give  precise  details  about  his  stimuli,  but  it  was 
clear  that  along  a  given  motion  path  there  were 
only  two  or  three  wires  to  resolve  across  his  far 
larger  display.  Performance  was  so  low  in  our 
dynamic-on-static  conditions  because  too  much 
spatial  acuity  was  required  of  the  2nd-order 
system  that  detects  the  motion  of  flickenng 
regions. 

How  useful  for  perception  of  shape  iS  a 
disj„ay  of  dynamic  noise  figures  moving  on  a 
static  noise  background?  We  have  examin;d  a 
large  number  of  disk  and  (thick)  wire  displays 
in  order  to  span  the  gap  of  spatial  resolution 
between  Prazdny’s  displays  and  our  own.  With 
our  3x3  deg  display  size,  a  shape  percept  can 
only  be  achieved  by  using  a  very  small  number 
of  tokens  (around  5-10).  These  displays  con-; 
sisted  of  rotating  disk  tokens.  Cavanagh  and 
Ramachandran  (1988)  suggest  an  altert^.tive 
explanation  of  the  difference  between  our  results 
and  those  of  Prazdny.  They  consider  the  crucial 
difference  to  be  that  the  objects  portrayed  in  the 
Prazdny  displays  were  connectc^one  long  wire 
figure),  whereas  our  displays  consisted  of  s;;  par- 
ate  disk  tokens.  With  our  wire  displays,  al  nost 
no  3D  percept  was  achieved  for  the  dynami  -on- 
static  condition.  In  addition,  we  v/ere  able  to 
achieve  a  3D  percept  with  displays  of  a  mall 
number  of  dynamic-on-static  disks.  Thun,  we 


keel  that  low  spatial  resolution  in  the  2rid- 
ofder  motion  system  (rather  than  unconnected 
tokens)  is  the  likely  explanation  for  failure  of 
KDE. 

Contrast  polarity  alternation.  Performance  is 
quite  poor  for  the  contrast  polarity-alternating 
dots  as  it  was  in  the  previous  paper  (Dosher  et 
al..  1989b)  For  two  subjects  (JBL  and  LJJ) 
performance  is  at  chance  or  insignificantly 
above  chance.  For  subject  MSL,  performance  is 
low  (11%  correct)  hut  significantly  above 
chance  (P  <  0.05).  On  the  other  hand,  when  the 
token  is  changed  to  disks  or  wires,  performance 
rises  substantially.  Contrast  polarity  alternation 
is  not  as  devastating  a  stimulus  manipulation 
for  disks  and  wires  as  it  is  for  dots. 

For  Ist-order  motion  detection  mechanisms 
such  as  the  Reichardt  detector,  contrast  polarity 
alternation  causes  the  strongest  responses  to  be 
in  the  wrong  direction.  Yet,  the  intended  motion 
can  be  detected  quite  accurately  if  a  2nd-order 
detector  is  used  that  first  applies  a  luminance 
nonlinearity  followed  by  a  Reichardt  detector. 
The  primary  difference  between  the  dots  on  the 
one  hand,  and  the  disks  and  wires  on  the  other, 
is  that  the  disks  and  wires  have  more  pixels 
illuminated.  In  other  words,  they  have  more 
contrast  energy,  and  in  particular  thay  have 
more  energy  at  lower  spatial  frequencies.  Thus, 
the  disk  and  wire  stimuli  should  stimulate  both 
the  Isl-  and  2nd-order  motion  detection  systems 
more  strongly,  resulting  in  stronger  incorrect 
direction  information  from  the  Ist-ordcr 
system  as  a  whole,  but  also  stronger  information 
from  the  2nd-order  system,  and  stronger 
directional  information  in  those  selected  Ist- 
order  frequency  bands  which  signal  the  correct 
direction. 

It  is  interesting  to  note  that  a  large  number  of 
the  errors  made  by  observen  with  polarity-alter¬ 
nating  stimtili  were  errors  in  the  direction  of 
rotation  only,  with  the  shape  specified  correctly. 
For  example,  for  a  stimulus  which  had  as 
correct  answers  either  u  -h  —  0/  or  u  — h  Or,  the 
subject  incorrectly  responded  with  u  —  Or  or 
U  —  +  0/, -rather  than  with  any  of  the  104  other 
possible  incorrect  responses.  This  effect  was 
largest  for  the  disk  tokens.  In  a  separate  control 
experiment,  for  contrast  polarity-alternating 
disk  stimuli.  39%  of  the  errors  made  by  subject 
MSL  were  only  an  error  in  the  specification  of 
direction,  compared  to  1.4%  direction  errors 
for  the  dynamic-on-gray  conditions.  For  subject 
JBL,  the  corresponding  values  were  48%  and 
5.6%  For  the  polarity-alternating  disks,  on 
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whin  iubjeci  MSL  correcUy  identified  the 
ihapc,  there  was  a  33%  chance  that  he  would 
misidcntify  the  direction  of  rotation  (for  JBL: 
29.3%),  We  believe  that  accurate  shape  identifi¬ 
cation  in  this  condition  primarily  reflects  re¬ 
sponses  conitructed  from  selected  Ist-ordcr 
infomiatiori.  One  strategy  was  simply  :o  specify 
the  opposite  rotation  direction  to  that  which 
was  perceived!  The  displays  did.  however,  oc¬ 
casionally  appear  to  be  JD  with  the  correct 
direction  of  motion  (at  certain  times  during  the 
lotation,  or  close  to  the  location  to  which  the 
eyes  were  directed),  indicating  a  residual  2nd- 
order  motion  input  to  the  KDE  system.  The  fact 
that  these  displays  only  appeared  foveally  to  be 
rotating  in  the  correct  direction,  and  then  only 
using  the  larger  tokens,  is  consistent  with  a 
2fid-ordcr  motion  deteetton  system  wjth  low 
contrast  sensitivity  and  low  spatial  resolution 
(as  has  been  demonstrated  by  Chubb  & 
Sperling,  1988b),  and  more  sensitive  in  the  fovea 
(Chubb  &  Sperling.  1988a).  In  summary,  we 
have  some  indication  that  2nd-order  motion 
detection  mechanisms  can  be  used  to  derive  3D 
structure,  but  they  are  far  less  robust  and  have 
poorer  spatial  resolution  than  Ist-order  motion 
mechanisms. 

Pattern-on-siaitc.  For  all  three  subjects  per¬ 
formance  with  paitcrn-cn-static  displays  is  quite 
poor  (9,  26  and  33%  corrrect),  although  it  is 
significantly  above  chance  levels  in  all  cases 
(P  <  0.05).  This  poor  performance  results  from 
a  mismatch  of  resolution  and  temporal 
sampling.  The  patterned  disks  are  quite  de¬ 
tailed/high  frequency.  The  disks  are  6  pixels  in 
diameter,  and  can  move  as  far  as  8.3  pixels  in 
one  frame.  This  speed  is  only  achieved  by  disks 
at  the  top  of  a  peak  when  in  the  middle  of  the 
display  (i.e.  near  frame  numbers  0.  15  and  29). 
but  many  disks  are  moving  3-S  pixels  per  frame 
High  frequency  spatial  filters  which  are  required 
to  identify  the  disks  must  correlate  across 
frames  with  filters  that  are  far  more  than  90  deg 
away  in  the  phase  of  their  peak  spatial  fre¬ 
quency.  A  typical  Ist-order  detector  will  not 
compare  spatial  regions  that  far  apart  in  order 
to  avoid  .vpatio-temporal  aliasing  (van  Santen  & 
Sperling.  1984).  Thus,  the  clearest  mouon  sig¬ 
nals  are  coming  from  the  slower  areas  in  the 
display,  which  are  the  least  useful  for  discrimi¬ 
nating  the  shapes.  We  have  examined  patiem- 
on-static  displays  with  finer  temporal  sampling 
(60  new  frames  per  sec,  as  oppos^  to  4  repainu 
of  15  new  frames  per  sec  used  in  the  exper¬ 
iment).  and  they  give  a  strong  impression  of 
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three-dirr.eriSiORality  Thus,  poor  performance 
in  the  task  resulted  from  undersampling  in  time 
of  the  stimuli,  which  interferes  with  Ist-order 
(and  some  2nd-order)  motion  mechanisms,  and 
good  KDE  can  result  from  the  motion  of  tokens 
which  are  camouflaged  when  at  rest 

We  have  also  examined  dynamic-on-static 
displays  with  finer  temporal  sampling  (60  new 
frames  per  sec).  These  displays  yield  no  im¬ 
pression  of  three-dimensionality.  The  poor  re¬ 
sults  for  dynamic-on-static.  displays  do  not 
result  from  insufficient  sampling  in  time.  Also, 
since  finely  sampled  pattem-on-static  displays 
d%appear  3D.  poor  performance  with  dynamic- 
on-static-displays  does  not  result  from  the 
camouflage  of  the  tokens  when  at  rest.  Rather, 
dynamic -oR-static  displays  yield  no  effective 
KDE  because  cf  the  low  resolution  of  the 
2nd-order  system  required  to  analyze  the 
motion 

EXPtRIMENT  1  TWO-FRAME  KDE 

The  first  experiment  shows  that  accurate  per¬ 
formance  in  shape  identification  is  dependent 
upon  a  global  (primarily  Ist-order) optic  flow.  If 
a  stimulus  manipulation  makes  that  optic  flow 
noisy  or  otherwise  interferes  with  the  optic  flow 
computation,  there  is  little  or  no  KDE.  This 
occurs  even  though  fov^al  scrutiny  does  reveal 
the  motion  in  these  displays 

If  the  percept  of  surface  shape  depends  upon 
a  global  optic  flow,  then  we  should  be  able  to 
get  reasonable  shape  identification  performance 
from  any  stimultis  that  results  in  a  strong  per¬ 
cept  of  optic  flow.  In  particular,  the  extended 
(2  sec)  viewing  conditions  cf  expt  I  should  not 
be  necessary.  Two  frames  are  obviously  the 
minimum  number  of  frame;  that  can  yield  a 
percept  of  motion,  and  two  frames  should 
suffice.  In  the  second  experiinent.  we  investigate 
the  accuracy  of  performanoe  in  the  shape 
identification  task  for  two-frame  displays. 

Method 

Subjects.  Therq,  were  two  subjects  in  this 
experiment.  One  was  an  author,  and  the  other 
was  a  graduatt'  st'jdent  naive  to  the  purposes  of 
this  experiinent.  Both  had  normal  or  corrected- 
to-normal  vision.  There  were  slight  differences 
in  the  conditions  for  each  of  the  two  subjects. 
These  will  be  pointed  out  below. 

Stimuli  and  apparatus.  The  stimuli  were  simi¬ 
lar  to  the  white-on-gray  dot  stimuli  from  expt  1. 
Stimuli  were  generated  from  the  same  set  of  3D 
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shapes,  using  the  same  dot  densities,  and  pro-  182  x  182  pixels,  and  were  presented  using  the 
Jected  in  the  same  way.  The  local  dot  density  same  apparatus  and  viewing  conditions  as  for 
was  kept  constant  using  the  same  scintillation  subject  LJJ  in  expt  1.  The  background  lumt- 
procedure.  New  stimuli  were  computed,  two  of  nances  for  subjects  MSL  and  LfJ  were 
the  flat  shape,  and  one  of  each  of  the  other  S2  lS.6cd/m^  and  S.Ocd/m*.  respectively.  The  cor- 

shapes,  resulting  in  S4  displays.  _  responding  dot  luminosities  were  26.8  and 

Each  display  consisted  of  II  frames,  rotating  15.7  extra  /icd/dot,  respectively.  Nominal  con- 
from  20  deg  left  to  20  deg  right  in  increments  of  trasts  were  huge  (i.e.  nominal  Weber  contrasts 
4  deg  per  frame.  The  middle  frame  (number  6)  of  500%  or  more). 

was  face-forward,  as  was  the  first  frame  of  each  Procedure.  The  task  was  shape  and  rotation 
display  in  expt  I.  Two-frame  stimuli  consisted  ideotification.  Subjects  keyed  their  responses 
of  a  presentation  of  the  middle  frame  followed  using  response  buttons,  and  received  feedback 
by  one  of  the  other  10  display  franM.  This  on  the  di^lay  aAer  their  response.  Three  groups 
resulted  in  cither  a  leftward  or  rightward  ro-  of  trials  were  rtm.  In  the  first,  the  ISI  w.j 
ration  of  4-20  deg  between  the  two  frames  of  the  16.7  msec,  and  rotation  angle  between  frames 
display.  A  single  trial  display  consisted  of0.5sec  was  varied  from  4  to  20deg.  Since  the  second 
of  a  cue  spot,  0.5  sec  blank,  the  first  frame,  an  frame  could  be  dwiscn  from  cither  the  frames 
inter-stimulus  blank  interval  (or  ISI).  the  second  preceding  or  succeeding  the  middle  frame 
frame,  and  a  blank.  Each  stimulus  frame  was  (rotation  to  the  left  or  ri^t),  this  resulted  in  540 
repainted  four  times  at  60  Hz,  for  a  total  dur-  possible  stimuli  (54  di^iays,  2  directions,  5 
ation  of  67  msec.  We  define  the  ISI  to  be  the  roution  an^).  These  were  run  in  random 
time  interval  between  the  onset  of  the  last  order  in  4  Modes  of  135  trials.  In  the  second 
painting  of  the  first  stimulus  frame  and  the  onset  group  of  trials,  rotation  was  kept  constant  at 
of  the  first  painting  of  the  second  stimulus  4  deg.  ISI  ranged  from  16.7  to  83.3  msec.  This 
frame.  For  example,  when  no  blank  frames  were  again  resulted  in  540  trials  presented  in  random 
used,  the  ISI  was  16.7  msec.  Displays  were  order  in  4  Nocks  of  1 35  trials.  In  the  third  group 


Rf.  4.  Resulu  of  expt  2.  Dtta  for  t»o  MbjKti  are  shown.  Erroc  ban  indBcale  ±1  SEM.  (A) 
Shape-and-routioa  idewtifkation  aoennqr  u  a  ftiactioa  oftlic  angk  of  roiaiion  between  the  two  firames. 

ISI  wu  16.7  msec.  (B)  Sbape-and-fotation  idealilicaiioo  accuracy  aa  a  fimetiM  of  the  dnntioo  of  a  Wank 

inter-stimulus  interval  (ISI>.  Rolaiioa  ansie  wm  44^  ^  The  two  maaipttlatkMtt  used  in  the  same 

experimenL  Note  the  lack  of  inieraction. 
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of  tiiala,  both  rotation  angle  and  ISi  -v.re 
varied.  The  ISIs  were  either  16.7  or  33.3  ;:.,ec 
For  subject  MSL,  the  rotation  angles  -  ;re 
either  4  or  12 deg.  For  LJJ,  they  were  a::.:.:  8 

or  12  deg.  These  four  conditions  (two  rot _ tn 

angles  by  two  ISIs)  resulted  in  432  trials  ..th 
were  presented  in  random  order  in  4  blc.  of 
108  trials 


Results 

The  results  are  shown  m  Fig.  4,  Each 
point  is  the  percent  correct  over  108  trials  is 
evident  from  the  figure,  shape  identificatior.  ..  a 
be  quite  high  for  these  minimal  motion  disp  .cvs 
(for  similar  observations  using  different  c't:  tr¬ 
imental  methodology,  see  Braunstein,  Hoffrr— u. 
Shapiro,  Andersen  Sc  Bennett.  1987,  Lapr.n. 
Doner  &  Kottas,  1980;  Mather.  1989;  and  Pc  t  - 
sik,  1930).  For  an  ISI  of  16.7  msec  (Fig  ^  ;. 
this  entire  sequence  tasted  only  133  msec  :  ci. 
performance  was  as  high  as  54.6%  for  sue.  ret 
LJJ.  and  88.9%  for  subject  MSL  (62.8%  irid 
94.2%  of  their  white-on-gray  dots  performar.cr 
in  expt  1,  respectively).  Two  frames  of  mo  -  ng 
dots  arc  sufficient  for  accurate,  although  iOt 
perfect 

performance  in  this  shape  identification 
Since  these  experiments  were  first  -Mpof  icJ 
(Landy,  Sperling,  Dosher  &  Perkins,  19; '  j, 
Landy,  Sperling.  Perkins  St  Dosher,  198  .  *. 
Todd  (1988)  has  also  shown  above-chance  K  2  c 
performance  for  two-frame  stimuli,  although. 
his  paradigm  the  two  frames  arc  repeated  s^,- 
cral  times  before  a  response  is  made. 

Rotation  angle  and  fixation.  Performance  is  a 
function  of  rotation  angle  between  the  two 
frames  is  given  in  Fig.  4A.  Performance  de¬ 
creases  with  increasing  angle  of  roution  for 
subject  MSL.  For  subject  UJ,  perforTT.arsce 
reaches  a  peak  at  8  deg,  and  decreases  for 
smaller  and  larger  rotations.  The  decrease  m 
performance  with  larger  rotation  angles  is  tc  be 
expected,  since  the  correspondence  proWem  be¬ 
comes  increasingly  difficult  as  dots  move  farther 
from  their  initial  positions.  One  might  also 
expect  performance  to  drop  as  rotation  angle 
decreases  to  zero.  At  extremely  small  rotation 
angles,  the  remaining  motion  would  fall  below 
threshold.  In  our  displays,  the  drop  with  sntall 
rotation  angles  hiight  be  expected  to  occur  ev’cn 
sooner  as  the  small  motions  in  the  display 
became  corrupted  by  poor  spatial  samplirg 
(inter-pixel  distance  was  approx.  1  min  arc) 
This  drop  was  only  seen  in  the  data  of  LJJ,  and 


presumably  would  be  seen  in  those  of  MSL  if  he 
had  been  (ested  using  smaller  rotations. 

In  a  previous  paper  (Dosher  ei  al.,  1989b).  we 
found  that  adding  a  blank  interval  between 
successive  frames  of  a  30  frame  KDE  stimulus 
reduced  shape  identification  to  near  chance 
performance.  This  was  explained  by  reduction 
of  power  in  the  stimulus  to  the  Ist-order  system 
This  effect  is  also  seen  here,  where  performance 
decreases  monotonically  with  increasing  ISI 
(Fig.  4B).  Subject  LJJ  performs  at  chance  levels 
With  a  50  msec  or  greater  ISI.  while  subject  MSL 
IS  still  slightly  above  chance  performance  with 
an  83.3  msec  ISI 

Time  and  distance.  In  the  previous  two  groups 
cf  trials,  there  was  a  confounding  between  the 
stimulus  manipulation  (rotation  angle  or  ISI) 
and  dot  velocity.  Greater  rotation  angles  at  a 
fixed  (16.7  msec)  ISI  produced  greater  velocities. 
S;mi)arly.  greater  ISIs  at  a  fixed  4  deg  roution 
angle  resulted  in  snuller  velocities.  If  perform 
a  nee  were  simply  a  function  of  velocity,  then 
roution  angle  and  ISI  should  trade  off.  In  Fig 
4C  we  present  the  results  of  varying  both  ISI 
and  roution  an^e  factorially.  We  used  a  difTer- 
ent  set  of  routions  for  subject  UJ  than  MSL 
based  on  the  results  in  Fig.  4A.  so  that  for  both 
subjects  the  perfomunoe  was  expected  to  de¬ 
crease  with  increasing  rotation  angles.  As  can  be 
seen  in  the  figure,  the  two  variables  do  not  trade 
cfT  as  would  be  expected  if  performance  were 
cniy  a  function  of  velocity,  or  roution  speed 
Increasing  roution  angle  increases  the  difficulty 
of  the  correspondence  proMem.  Increasing  ISI 
causes  increasing  proUCTS  for  the  motion  de¬ 
tection  system,  ^th  manipulations  degrade 
perfonnanoe  in  an  additive  fashion.  This  obser¬ 
vation  contradku  Korte’s  (1915)  3rd  law  of 
apparent  motion  perception,  which  sutes  that 
an  increase  in  ISI  must  be  counteracted  by  an 
increase  in  disunce  traveled  for  strong  appareiit 
motion.  In  Fig.  4C.  Korte’s  law  predicts  a 
cross-over  interaction,  whidi  is  strongly  dis- 
confirmbd.  However,  Burt  and  Sperling  (1981) 
show  that  time  and  disunce  have  independent 
additive  effects  on  the  strength  of  the  apparent 
motion  of  dot  stimOIi,  which  agrees  with  the 
present  results. 

KDE  from  optic  flow.  Accurate  KDE  per¬ 
formance  requires  a  global  optk  flow.  When 
that  optic  flow  is  produced  a  minimal  motion 
stimulus— a  two-frame  display — ^tbe  shape  per¬ 
cept  may  be  fragile  and  easily  degraded  by  a 
variety  of  stimulus  manipulations.  The  stimuli 
are  quite  brief  in  this  paradigm  and,  by  subject 
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r  eports,  appear  as  a  coHcctson  of  dots  moving 
at  various  speeds,  i.e.  “look  like”  an  optic 
flow.  On  some  trials,  only  patches  of  planar 
r.otiGn  are  perceived,  and  the  shape  response 
IS  ttenerated  cognitively.  On  other  trials,  a 
7  [3  surface  is  perceived.  On  some  trials  the 
optic  flow  is  perceived  and  so  is  the  shape, 
but  the  shape  percept  is  only  “felt”  after  the 
display  is  over.  As  we  discussed  extensively  in 
our  first  article  on  the  shape  identification 
task  (Sperling  et  al.,  1989).  KDE  is  inextricably 
tied  with  the  percept  of  an  optic  flow.  It  can 
be  very  difficult  to  differentiate  empirically 
t>ctwcen  a  judgment  based  on  a  3D  percept 
and  performance  based  on  an  alternative  strat¬ 
egy  (computationally  equivalent  to  that  re¬ 
quired  for  KDE)  using  a  remembered  set  of  2D 
velocities. 

Reasonably  accurate  performance  on  the 
shape-and-rotation  identification  task  results 
from  only  two  frames  of  300  points.  In  the 
computer  vision  literature,  there  have  been  sev¬ 
eral  studies  of  the  structure-from-motion  prob¬ 
lem  resulting  in  theorems  of  the  following  form: 
"m  views  of  n  points  under  the  followingjrwtric- 
tions  of  the  motion  path  suffice  to  determine  the 
3D  structure  up  to  a  reflection”  (Bennett  & 
Hoffman,  1985;  Hoffman  &  Bennett,  1985; 
Hoffman  &  Flinchbaugh,  1982;  Ullman,  1979). 
It  has  been  suggested  that  these  minimal  con¬ 
ditions  for  structure  from  motion  also  govern 
human  perception  (Braunstein  et  al..  1987; 
Petersik,  1987).  The  particular  models  just  men¬ 
tioned  do  not  have  any  prediction  concerning 
performance  in  the  300  points/2  views  situation 
used  here.  An  exception  is  a  recent  paper  by 
Bennett,  Hoffman,  Nicola  and  Prakash  (1$89), 
where  it  is  shown  that  there  is  a  one  parameter 
family  of  possible  interpretations  for  two  frames 
of  four  or  more  points.  This  family  is  parame¬ 
terized  by  the  slant  of  the  axis  of  rotation  (as  in 
the  “isoldnescopic  displays”  described  by  Adel* 
son,  1985),  and  the  paper  does  not  deal  explic¬ 
itly  with  rotation  axes  in  the  image  plane,  as 
used  here.  On  the  other  hand,  models  that 
compute  3D  structure  based  only  upon  a  sin^e 
velocity  field  do  allow  for  this  performance 
(Longuet-Higgins  &  Prazdny,  1980;  Koenderink 
Sl  van  Doom,  1986).  We  take  our  experimental 
results  as  evidence  for  optic  flow-based  methods 
for  the  KDE,  as  opposed  to  models  requiring 
three  or  more  views.  In  particular,  our  results 
strongly  rule  out  models  that  require  measure¬ 
ment  of  acceleration  in  addition  to  velocity  (c.g. 
Hoffman,  1982). 


Struciure-frcm-motion  computation  may 
improve  its  3D  representation  with  additional 
information  (e.r  with  additional  frames, 
Grzjrvacz,  Hildr  b,  Inada  &  Adelson,  1988; 
Hildreth  u.  lirzyv*  1986;  Landy,  1987; 

Ullman,  1934),  Thr  in  our  two-frame 

displays  does  not  .  V  appear  to  have  the 
depth  extent  bai-  from  the  30  frame 

displays  of  expt  I,  and  two-frame  performance 
is  reduced  relative  to  30-rrame  performance. 
The  shape  identification  task  can  be  solved  by 
knowing  only  the  sign  of  depth  and  direction  of 
motion  in  each  spatial  location  (up  to  a  reflec¬ 
tion).  without  accurately  estimating  either  vel¬ 
ocity  or  the  amount  of  depth. 

DISCLSSION 

Two  experiments  investigated  the  type  of 
motion  detection  mechanism  used  as  an  input  to 
the  structure- from-modon  system.  Performance 
m  the  shape-and-rotation  identification  task 
was  accurate  regardless  of  the  token  used  to 
carry  the  motion,  as  long  as  that  token  was 
presented  with  constant  contrast  polarity  (the 
vhite-on-gray  and  pattemK>n-gray  conditions), 
rhe  performance  decrements  seen  with  contrast 
polarity  alternation  and  the  two  microbalanced 
conditions  add  further  evidence  to  the  con¬ 
clusion  of  Dosher  et  al.  (1989b)  that  Ist-ordet 
motion  detectors  ate  the  primary  substrate  for 
the  computation  of  shape.  In  addition,  there  are 
.ndications  of  an  input  to  the  shape  compu- 
cation  from  2nd-order  motion  mechanisms, 
which  is  weak,  low  in  spatial  resolution,  and 
concentrated  at  the  fovea.  2nd-order  mechan¬ 
isms  that  require  temporal  filtering  (i-c.  detec¬ 
tion  of  flicker)  prior  to  a  point  nonlinearity  were 
usete»  here  because  of  the  spatial  resolution 
required  by  our  stimuli.  These  sorts  of  detectors 
would  only  be  useful  for  KDE  displays  involv¬ 
ing  a  small  number  of  moving  features,  rather 
than  the  densdy  samided  optic  flows  required 
for  the  determination  of  precise  shapes  of 
carved  surfaces  from  motion  ciks.  The  results 
from  the  two-frame  experiments  reinforced 
these  conclusions.  They  alM  demonstrated  that 
detection  of  instantaneous  velocity  is  sufficient 
for  KDE;  acceleration  is  not  required,  nor  are 
more  than  two  views. 
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